Compositions and methods for t-cell receptor gene assembly

ABSTRACT

Provided herein are compositions and methods for assembling nucleic acid sequences encoding T-cell receptors.

CROSS-REFERENCE

This application is a continuation of International Application No.PCT/US20/26558, filed Apr. 3, 2020, which claims the benefit of U.S.Provisional Patent Application No. 62/829,813, filed Apr. 5, 2019, U.S.Provisional Patent Application No. 62/838,465, filed Apr. 25, 2019, U.S.Provisional Patent Application No. 62/898,053, filed Sep. 10, 2019, andU.S. Provisional Patent Application No. 62/972,231, filed Feb. 10, 2020,each of which is entirely incorporated herein by reference.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted electronically in ASCII format and is hereby incorporated byreference in its entirety. Said ASCII copy, created on Oct. 4, 2021, isnamed 53563-705_301_SL.txt and is 69,220 bytes in size.

BACKGROUND OF THE INVENTION

The T-cell receptor (TCR) can be responsible for the recognition of theantigen-major histocompatibility complex, leading to the initiation ofan inflammatory response. Many T cell subsets exist, including cytotoxicT cells and helper T cells. Cytotoxic T cells (also known as CD8+ Tcells) kill abnormal cells, for example virus-infected or tumor cells.Helper T cells (also known as CD4+ T cells) aid in the activation andmaturation of other immune cells. Both cytotoxic and helper T cellscarry out their function subsequent to the recognition of specifictarget antigens which triggers their respective responses. The antigenspecificity of a T cell can be defined by the TCR expressed on thesurface of the T cell. T-cell receptors are heterodimer proteinscomposed of two polypeptide chains, most commonly an alpha chain and abeta chain, but a minority of T cells can express a gamma and deltachain. The specific amino acid sequence of the TCR and the resultantthree-dimensional structure defines the TCR antigen specificity andaffinity. The amino acid and coding DNA sequences of the TCR chains forany individual T cell are almost always unique or at very low abundancein an organism's entire TCR repertoire, since there are a vast number ofpossible TCR sequences. This large sequence diversity can be achievedduring T cell development through a number of cellular mechanisms andmay be a critical aspect of the immune system's ability to respond to ahuge variety of potential antigens.

Analyzing the TCR repertoire may help to gain a better understanding ofthe immune system features and of the aetiology and progression ofdiseases, in particular those with unknown antigenic triggers. Theextreme diversity of the TCR repertoire and the bipartite nature of TCRscan represent a major analytical challenge. High-throughput sequencingcan allow greater sequencing depth and significantly more accuratequantification of TCR clonotype abundance, albeit at a greater expensethan spectratyping.

SUMMARY OF THE INVENTION

Provided herein are compositions and methods to assemble nucleic acidsequences encoding natively paired T-cell receptors (TCRs) (or cognateTCR pairs). For example, a TCR can comprise a TCR alpha chain and a TCRbeta chain or a TCR can comprise a TCR gamma chain and a TCR deltachain. Sequences encoding natively paired TCRs can be identified usingvarious methods, including but not limited to using single cellbarcoding and sequencing technologies. After obtaining the sequencesencoding natively paired TCRs, compositions and methods described hereincan be used to construct or assemble one or more nucleic acid sequencesto express the natively paired TCRs in any given host cell(s) in aquick, high-throughput and cost-effective manner. The one or morenucleic acid sequences can comprise greater than or equal to about 1, 5,10, 20, 50, 100, 200, 300, 400, 500, 1,000, 1,500, 2,000, 2,500, 3,000,3,500, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 12,000, 15,000,20,000, 100,000, 1,000,000, 10,000,000, or more different sequencesencoding different TCRs.

In an aspect, the present disclosure provides a method for generating anucleic acid molecule encoding a T-cell receptor (TCR) chain or portionthereof, comprising: (a) providing at least one nucleic acid moleculecomprising a sequence encoding a CDR3 of a TCR chain; (b) providing aplurality of nucleic acid molecules, each nucleic acid molecule of theplurality comprising a sequence derived from a TCR V gene, wherein theplurality of nucleic acid molecules comprises at least two differentsequences derived from at least two different TCR V genes; and (c)contacting the at least one nucleic acid molecule of (a) to theplurality of nucleic acid molecules of (b) in a same compartment,wherein the at least one nucleic acid molecule of (a) is capable oflinking to a nucleic acid molecule of the plurality of nucleic acidmolecules to generate a third nucleic acid molecule comprising thesequence encoding the CDR3 and a sequence derived from one of the atleast two different TCR V genes, thereby generating the nucleic acidmolecule encoding the TCR chain or portion thereof. In some embodiments,the at least one nucleic acid molecule comprises at least about 2, 5,10, 20, 50, 100, 200, 300, 400, 500, 1,000, 1,500, 2,000, 2,500, 3,000,3,500, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 12,000, 15,000,20,000, 100,000, 1,000,000, 10,000,000, or more different sequences. Insome embodiments, the plurality of nucleic acid molecules, each nucleicacid molecule of the plurality comprising a sequence derived from a TCRV gene, comprises at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60 61, 62, 63, 64, 65, 66, 67,68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 or more differentsequences derived from different TCR V genes.

In some embodiments, the at least one nucleic acid molecule comprises afirst plurality of nucleic acid molecules, wherein each nucleic acidmolecule of the first plurality of nucleic acid molecules comprises asequence encoding a CDR3 of a TCR chain. In some embodiments, the atleast one nucleic acid molecule of (a) is capable of specificallylinking to a nucleic acid molecule of the plurality of nucleic acidmolecules that comprises a sequence derived from any single given TCR Vgene of the at least two different TCR V genes. In some embodiments, theat least one nucleic acid molecule further comprises a J region of theTCR chain. In some embodiments, each nucleic acid molecule of the firstplurality of nucleic acid molecules further comprises a J region of aTCR chain. In some embodiments, the at least two TCR V genes are humanTCR V genes or mouse TCR V genes. In some embodiments, the at least twoTCR V genes are selected from the group consisting of a human TRAV1-1,TRAV1-2, TRAV2, TRAV3, TRAV4, TRAV5, TRAV6, TRAV7, TRAV8-1, TRAV8-2,TRAV8-3, TRAV8-4, TRAV8-6, TRAV9-1, TRAV9-2, TRAV10, TRAV12-1, TRAV12-2,TRAV12-3, TRAV13-1, TRAV13-2, TRAV14, TRAV16, TRAV17, TRAV18, TRAV19,TRAV20, TRAV21, TRAV22, TRAV23, TRAV24, TRAV25, TRAV26-1, TRAV26-2,TRAV27, TRAV29, TRAV30, TRAV34, TRAV35, TRAV36, TRAV38-1, TRAV38-2,TRAV39, TRAV40, and TRAV41. In some embodiments, the at least two TCR Vgenes are selected from the group consisting of a human TRBV2, TRBV3-1,TRBV4-1, TRBV4-2, TRBV4-3, TRBV5-1, TRBV5-4, TRBV5-5, TRBV5-6, TRBV5-8,TRBV6-1, TRBV6-2, TRBV6-3, TRBV6-4, TRBV6-5, TRBV6-6, TRBV6-8, TRBV6-9,TRBV7-2, TRBV7-3, TRBV7-4, TRBV7-6, TRBV7-7, TRBV7-8, TRBV7-9, TRBV9,TRBV10-1, TRBV10-2, TRBV10-3, TRBV11-1, TRBV11-2, TRBV11-3, TRBV12-3,TRBV12-4, TRBV12-5, TRBV13, TRBV14, TRBV15, TRBV16, TRBV18, TRBV19,TRBV20-1, TRBV24-1, TRBV25-1, TRBV27, TRBV28, TRBV29-1, and TRBV30. Insome embodiments, each sequence of the plurality of sequences derivedfrom the at least two different TCR V genes comprises a sequenceencoding L-PART1, L-PART2, FR1, CDR1, FR2, CDR2, and/or FR3. In someembodiments, the TCR chain is a TCR alpha chain, a TCR beta chain, a TCRgamma chain, or a TCR delta chain. In some embodiments, the at least onenucleic acid molecule further comprises an additional sequence encodingan additional CDR3 of an additional TCR chain. In some embodiments, theat least one nucleic acid molecule comprises an additional J region ofthe additional TCR chain. In some embodiments, the sequence encoding theCDR3 and the additional sequence encoding the additional CDR3 areseparated by at most 100 nucleotides. In some embodiments, the TCR chainand the additional TCR chain are a cognate pair of TCR chains. In someembodiments, the at least one nucleic acid molecule comprises aconnector sequence, which connector sequence is capable of linking theat least one nucleic acid molecule to the nucleic acid molecule of theplurality of nucleic acid molecules to generate the third nucleic acidmolecule. In some embodiments, the at least one nucleic acid moleculeand the nucleic acid molecule of the plurality of nucleic acid moleculesencodes a functional TCR chain or portion thereof. In some embodiments,the nucleic acid molecule of the plurality of nucleic acid moleculescomprises an anti-connector sequence, which anti-connector sequence iscomplementary to the connector sequence of the at least one nucleic acidmolecule of (a). In some embodiments, the method further compriseslinking the at least one nucleic acid molecule of (a) and the nucleicacid molecule of the plurality of nucleic acid molecules of (b). In someembodiments, linking comprises hybridizing the at least one nucleic acidmolecule of (a) and the nucleic acid molecule of the plurality ofnucleic acid molecules of (b). In some embodiments, hybridizingcomprises hybridizing the connector sequence of the at least one nucleicacid molecule of (a) with the anti-connector sequence of the nucleicacid molecule of the plurality of nucleic acid molecules of (b). In someembodiments, the method further comprises (i) extending a free 3′ end ofthe nucleic acid molecule of the plurality of nucleic acid moleculesusing the at least one nucleic acid molecule of (a) as a template,and/or (ii) extending a free 3′ end of the at least one nucleic acidmolecule of (a) using the nucleic acid molecule of the plurality ofnucleic acid molecules as a template, to generate the third nucleic acidmolecule. In some embodiments, the method further comprises ligating theat least one nucleic acid molecule of (a) and the nucleic acid moleculeof the plurality of nucleic acid molecules (b). In some embodiments, themethod further comprises contacting the third nucleic acid molecule witha restriction enzyme to generate a sticky end. In some embodiments, themethod further comprises contacting the third nucleic acid molecule withan additional nucleic acid molecule. In some embodiments, the additionalnucleic acid molecule encodes a constant region or portion thereof of aTCR chain. In some embodiments, the method further comprises ligatingthe third nucleic acid molecule and the additional nucleic acidmolecule. In some embodiments, a plurality of nucleic acid molecules,each encoding a different TCR chain or portion thereof, are generated inthe same compartment. In some embodiments, at least five differentnucleic acid molecules of the plurality of nucleic acid molecules aregenerated in the same compartment. In some embodiments, at least tendifferent nucleic acid molecules of the plurality of nucleic acidmolecules are generated in the same compartment. In some embodiments, atleast 20, 50, 100, 200, 300, 400, 500, 1,000, 1,500, 2,000, 2,500,3,000, 3,500, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 12,000,15,000, 20,000, 100,000, 1,000,000, 10,000,000, or more differentnucleic acid molecules of the plurality of nucleic acid molecules aregenerated in the same compartment. In some embodiments, the samecompartment is a well, a tube, or a droplet. In some embodiments, the atleast one nucleic acid molecule comprises a unique barcode. In someembodiments, the unique barcode is a primer binding site. In someembodiments, the connector sequence comprises a unique barcode. In someembodiments, the unique barcode is a primer binding site.

In another aspect, the present disclosure provides a compositioncomprising (a) a plurality of nucleic acid molecules, wherein eachnucleic acid molecule of the plurality of nucleic acid moleculescomprises a sequence derived from a T-cell receptor (TCR) V gene anddoes not comprise a CDR3 sequence, wherein a first nucleic acid moleculeof the plurality comprises a first anti-connector sequence and a secondnucleic acid molecule of the plurality comprises a second anti-connectorsequence, wherein the first anti-connector sequence is different fromthe second anti-connector sequence, and wherein the sequence derivedfrom a TCR V gene of the first nucleic acid molecule and the secondnucleic acid molecule are derived from a different TCR V gene; and (b)at least one nucleic acid molecule comprising a sequence encoding a CDR3of a TCR chain, wherein the at least one nucleic acid molecule furthercomprises a first connector sequence complementary to the firstanti-connector sequence.

In some embodiments, the composition is a liquid composition. In someembodiments, the plurality of nucleic acid molecules of (a) and the atleast one nucleic acid molecule of (b) are in a same compartment. Insome embodiments, the sequence derived from the TCR V gene comprises atleast ten nucleotides of the TCR V gene. In some embodiments, the TCR Vgene is a TRAV gene, a TRBV gene, a TRGV gene, or a TRDV gene. In someembodiments, the sequence derived from the TCR V gene comprises asequence encoding L-PART1, L-PART2, FR1, CDR1, FR2, CDR2, and/or FR3. Insome embodiments, the at least one nucleic acid molecule furthercomprises a J region of the TCR chain. In some embodiments, the at leastone nucleic acid molecule further comprises an additional sequenceencoding an additional CDR3 of an additional TCR chain. In someembodiments, the at least one nucleic acid molecule further comprises anadditional J region of the additional TCR chain. In some embodiments,the sequence encoding the CDR3 and the additional sequence encoding theCDR3 are separated by at most 100 nucleotides. In some embodiments, theTCR chain and the additional TCR chain are a cognate pair of TCR chains.In some embodiments, the at least one nucleic acid molecule of (b)comprises a first plurality of nucleic acid molecules, and wherein eachnucleic acid molecule of the first plurality of nucleic acid moleculescomprises a sequence encoding a CDR3 of a TCR chain. In someembodiments, each nucleic acid molecule of the first plurality ofnucleic acid molecules encodes a different CDR3 of a different TCRchain. In some embodiments, each nucleic acid molecule of the firstplurality of nucleic acid molecules comprises a different connectorsequence, which different connector sequence is capable of specificallylinking to a nucleic acid molecule of the plurality of nucleic acidmolecules that comprises a sequence derived from any single given TCR Vgene. In some embodiments, the first anti-connector sequence or thesecond anti-connector sequence comprises a TCR V gene sequence. In someembodiments, the TCR V gene sequence comprises at least threenucleotides of the TCR V gene adjacent to a sequence encoding a CDR3 ina rearranged gene. In some embodiments, the first anti-connectorsequence or the second anti-connector sequence comprises apre-determined sequence. In some embodiments, the first connectorsequence hybridizes to the first anti-connector sequence. In someembodiments, the at least one nucleic acid molecule of (b) comprises aunique barcode. In some embodiments, the unique barcode is a primerbinding site. In some embodiments, the first connector sequence of theat least one nucleic acid molecule comprises a unique barcode. In someembodiments, the unique barcode is a primer binding site.

In another aspect, the present disclosure provides a method forgenerating a plurality of nucleic acid molecules, comprising: providinga first plurality of nucleic acid molecules, wherein a nucleic acidmolecule of the first plurality of nucleic acid molecules comprises asequence encoding a first CDR3 of a first T-cell receptor (TCR) chainand a second CDR3 of a second TCR chain, wherein the first CDR3 and thesecond CDR3 are from a cognate pair of TCR chains; providing a secondplurality of nucleic acid molecules, wherein a nucleic acid molecule ofthe second plurality of nucleic acid molecules comprises a sequencederived from a TCR V gene, wherein the nucleic acid molecule does notcomprise a sequence encoding a constant domain; and contacting the firstplurality of nucleic acid molecules and the second plurality of nucleicacid molecules, wherein the nucleic acid molecule of the first pluralityof nucleic acid molecules links with the nucleic acid molecule of thesecond plurality of nucleic acid molecules to form a nucleic acidmolecule comprising the sequence encoding the first CDR3 and the secondCDR3 and the sequence derived from the TCR V gene, wherein the sequenceencoding the first CDR3 and the second CDR3 and the TCR V gene arederived from the cognate pair of TCR chains.

In some embodiments, each nucleic acid molecule of the first pluralityof nucleic acid molecules comprises a sequence encoding a differentfirst CDR3 of a first TCR chain and/or a different CDR3 of a second TCRchain. In some embodiments, the first plurality of nucleic acidmolecules comprises at least about 2, 5, 10, 20, 50, 100, 200, 300, 400,500, 1,000, 1,500, 2,000, 2,500, 3,000, 3,500, 4,000, 5,000, 6,000,7,000, 8,000, 9,000, 10,000, 12,000, 15,000, 20,000, 100,000, 1,000,000,10,000,000, or more different sequences. In some embodiments, eachnucleic acid molecule of the second plurality of nucleic acid moleculescomprises a sequence derived from a different TCR V gene. In someembodiments, the second plurality of nucleic acid molecules comprises atleast about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,55, 56, 57, 58, 59, 60 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,73, 74, 75, 76, 77, 78, 79, 80 or more different TCR V genes. In someembodiments, the first plurality of nucleic acid molecules and thesecond plurality of nucleic acid molecules are contacted in a samecompartment. In some embodiments, the nucleic acid molecule of the firstplurality of nucleic acid molecules further comprises a connectorsequence, wherein the connector sequence links the nucleic acid moleculeof the first plurality of nucleic acid molecules and the nucleic acidmolecule of the second plurality of nucleic acid molecules. In someembodiments, the nucleic acid molecule of the second plurality ofnucleic acid molecules further comprises an anti-connector sequence,which anti-connector sequence is complementary to the connectorsequence. In some embodiments, the connector sequence hybridizes to theanti-connector sequence to link the nucleic acid molecule of the firstplurality of nucleic acid molecules and the nucleic acid molecule of thesecond plurality of nucleic acid molecules. In some embodiments, theconnector sequence is codon-diversified such that the connector sequenceof the nucleic acid molecule of the first plurality of nucleic acidmolecules is different from other connector sequences of other nucleicacid molecules of the first plurality of nucleic acid molecules. In someembodiments, the nucleic acid molecule of the first plurality of nucleicacid molecules further comprises a first J region of the first TCR chainand/or a second J region of the second TCR chain. In some embodiments,(i) the first TCR chain is a TCR alpha chain and the second TCR chain isa TCR beta chain or (ii) the first TCR chain is a TCR gamma chain andthe second TCR chain is a TCR delta chain. In some embodiments, the TCRV gene is a TRAV gene, a TRBV gene, a TRGV gene, or a TRDV gene. In someembodiments, the nucleic acid molecule of the second plurality ofnucleic acid molecules is a double-stranded nucleic acid molecule. Insome embodiments, the nucleic acid molecule of the second plurality ofnucleic acid molecules further comprises a sequence encoding a portionof a self-cleaving peptide. In some embodiments, the anti-connectorsequence is an overhang of the nucleic acid molecule of the secondplurality of nucleic acid molecules. In some embodiments, the connectorsequence or the anti-connector sequence is at least 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90,100, 150, 200, or more nucleotides in length. In some embodiments, themethod further comprises (i) extending a 3′ end of the nucleic acidmolecule of the first plurality of nucleic acid molecules hybridizedthereto with the nucleic acid molecule of the second plurality ofnucleic acid molecules and/or (ii) extending a 3′ end of the nucleicacid molecule of the second plurality of nucleic acid moleculeshybridized thereto with the nucleic acid molecule of the first pluralityof nucleic acid molecules. In some embodiments, the method furthercomprises ligating the nucleic acid molecule of the first plurality ofnucleic acid molecules with the nucleic acid molecule of the secondplurality of nucleic acid molecule.

In some embodiments, the method further comprises contacting the nucleicacid molecule comprising the sequence encoding the first CDR3 and thesecond CDR3 and the sequence derived from the TCR V gene with arestriction enzyme to generate a sticky end. In some embodiments, themethod further comprises contacting the nucleic acid molecule comprisingthe sequence encoding the first CDR3 and the second CDR3 and thesequence derived from the TCR V gene with an additional nucleic acidmolecule comprising a sequence encoding a constant region or portionthereof. In some embodiments, the method further comprises ligating thenucleic acid molecule comprising the sequence encoding the first CDR3and the second CDR3 and the sequence derived from the TCR V gene withthe additional nucleic acid molecule through the sticky end. In someembodiments, the sequence encoding the first CDR3 and the secondencoding the second CDR3 are separated by at most about 100, 90, 80, 70,60, 50, 40, 30, 20, 10, or 5 nucleotides. In some embodiments, thesequence derived from the TCR V gene comprises a sequence encoding FR1,CDR1, FR2, CDR2, and FR3. In some embodiments, the sequence derived fromthe TCR V gene comprises a sequence encoding L-PART1, L-PART2, FR1,CDR1, FR2, CDR2, and FR3.

In another aspect, the present disclosure provides a compositioncomprising: a first plurality of nucleic acid molecules, wherein eachnucleic acid molecule of the first plurality of nucleic acid moleculescomprises a sequence encoding a first CDR3 of a first T-cell receptor(TCR) chain and a second CDR3 of a second TCR chain, wherein the firstCDR3 and the second CDR3 are from a cognate pair of TCR chains; and asecond plurality of nucleic acid molecules, wherein each nucleic acidmolecule of the second plurality of nucleic acid molecules comprises asequence derived from a TCR V gene, and wherein each nucleic acidmolecule of the second plurality of nucleic acid molecules does notcomprise a sequence encoding the first CDR3 and the second CDR3; wherein(i) each nucleic acid molecule of the first plurality of nucleic acidmolecules comprises a sequence encoding a different first CDR3 and/orsecond CDR3, and/or (ii) each nucleic acid molecule of the secondplurality of nucleic acid molecules comprises a sequence derived from adifferent TCR V gene. In some embodiments, the first plurality ofnucleic acid molecules comprises at least about 2, 5, 10, 20, 50, 100,200, 300, 400, 500, 1,000, 1,500, 2,000, 2,500, 3,000, 3,500, 4,000,5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 12,000, 15,000, 20,000,100,000, 1,000,000, 10,000,000, or more different sequences. In someembodiments, the second plurality of nucleic acid molecules comprises atleast about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,55, 56, 57, 58, 59, 60 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,73, 74, 75, 76, 77, 78, 79, 80 or more different TCR V genes.

In some embodiments, each nucleic acid molecule of the first pluralityof nucleic acid molecules further comprises a connector sequence,wherein a given connector sequence is usable to link a given nucleicacid molecule of the first plurality of nucleic acid molecules and agiven nucleic acid molecule of the second plurality of nucleic acidmolecules. In some embodiments, each nucleic acid molecule of the secondplurality of nucleic acid molecules further comprises an anti-connectorsequence, which anti-connector sequence is complementary to theconnector sequence. In some embodiments, the connector sequence iscodon-diversified such that the given connector sequence of the givennucleic acid molecule of the first plurality of nucleic acid moleculesis different from other connector sequences of other nucleic acidmolecules of the first plurality of nucleic acid molecules. In someembodiments, the connector sequence encodes an amino acid sequence. Insome embodiments, the connector sequence is in frame with the sequenceencoding the first CDR3 of the first TCR chain and the second CDR3 ofthe second TCR chain. In some embodiments, the connector sequencecomprises at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100, 150, 200, or morenucleotides. In some embodiments, the connector sequence comprises atleast 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,39, 40, 50, 60, 70, 80, 90, 100, 150, 200, or more nucleotides of theTCR V gene adjacent to the sequence encoding the first CDR3 of the firstTCR chain or the second CDR3 of the second TCR chain. In someembodiments, a given amino acid sequence encoded by the given connectorsequence is the same or substantially the same as at least one otheramino acid sequence encoded by at least one other connector sequence. Insome embodiments, a given amino acid sequence encoded by the givenconnector sequence is different from other amino acid sequences encodedby other connector sequences. In some embodiments, each nucleic acidmolecule of the first plurality of nucleic acid molecules furthercomprises a first J region of the first TCR chain and/or a second Jregion of the second TCR chain. In some embodiments, the composition isa liquid composition. In some embodiments, the first plurality ofnucleic acid molecules and the second plurality of nucleic acidmolecules are within a same compartment. In some embodiments, the givennucleic acid molecule of the first plurality of nucleic acid moleculesis linked to the given nucleic acid molecule of the second plurality ofnucleic acid molecules through the given connector sequence. In someembodiments, the given nucleic acid molecule of the first plurality ofnucleic acid molecules hybridizes to the given nucleic acid molecule ofthe second plurality of nucleic acid molecules through the givenconnector sequence hybridized to a given anti-connector sequence. Insome embodiments, the sequence encoding the first CDR3 and the sequenceencoding the second CDR3 are separated by at most 100 nucleotides. Insome embodiments, the sequence derived from the TCR V gene comprises asequence encoding FR1, CDR1, FR2, CDR2, and FR3. In some embodiments,the sequence derived from the TCR V gene comprises a sequence encodingL-PART1, L-PART2, FR1, CDR1, FR2, CDR2, and FR3. In some embodiments,each nucleic acid molecule of the first plurality of nucleic acidmolecules or the second plurality of molecules is chemicallysynthesized. In some embodiments, each nucleic acid molecule of thefirst plurality of nucleic acid molecules is at most about 250, 240,230, 220, 210, 200, 190, 180, 170, 160, 150, 140, 130, 120, 110, 100, or50 nucleotides long.

In another aspect, the present disclosure provides a compositioncomprising a plurality of nucleic acid molecules, each nucleic acidmolecule of the plurality of nucleic acid molecules comprising asequence derived from a T-cell receptor (TCR) V gene sequence, whereinthe plurality of nucleic acid molecules comprises a first nucleic acidmolecule having a first connector sequence and a second nucleic acidmolecule having a second connector sequence, wherein the first connectorsequence is different from the second connector sequence.

In some embodiments, each nucleic acid molecule of the plurality ofnucleic acid molecules comprises a sequence derived from a different TCRV gene. In some embodiments, each nucleic acid molecule of the pluralityof nucleic acid molecules comprises a different connector sequence. Insome embodiments, each nucleic acid molecule of the plurality of nucleicacid molecules does not comprise a sequence encoding a CDR3 of a TCRchain. In some embodiments, each nucleic acid molecule of the pluralityof nucleic acid molecules does not comprise a sequence encoding aconstant domain of a TCR chain. In some embodiments, the sequencederived from the TCR V gene comprises at least 10, 20, 30, 50, 60, 70,80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, or morenucleotides of the TCR V gene. In some embodiments, the TCR V gene is aTRAV gene, a TRBV gene, a TRGV gene, or a TRDV gene.

In another aspect, the present disclosure provides a compositioncomprising a plurality of nucleic acid molecules, each nucleic acidmolecule of the plurality of nucleic acid molecules encoding a CDR3 or aportion thereof of a T-cell receptor (TCR) chain, wherein the pluralityof nucleic acid molecules comprises a first nucleic acid molecule havinga first connector sequence and a second nucleic acid molecule having asecond connector sequence, wherein the first connector sequence isdifferent from the second connector sequence.

In some embodiments, each nucleic acid molecule of the plurality ofnucleic acid molecules further comprises a J region of a TCR chain. Insome embodiments, each nucleic acid molecule of the plurality of nucleicacid molecules encodes a first CDR3 or a portion thereof of a first TCRchain and a second CDR3 or a portion thereof of a second TCR chain. Insome embodiments, each nucleic acid molecule of the plurality of nucleicacid molecules further comprises a first J region of a first TCR chainand a second J region of a second TCR chain. In some embodiments, eachnucleic acid molecule of the plurality of nucleic acid molecules encodesa different CDR3 or a portion thereof of a different TCR chain. In someembodiments, each nucleic acid molecule of the plurality of nucleic acidmolecules comprises a different connector sequence. In some embodiments,each nucleic acid molecule of the plurality of nucleic acid moleculesdoes not comprise greater than 200, 150, 100, 80, 50, 40, 30, 20, or 10nucleotides TCR V gene. In some embodiments, each nucleic acid moleculeof the plurality of nucleic acid molecules does not comprise a sequenceencoding a constant domain of a TCR chain. In some embodiments, thefirst connector sequence or the second connector sequence comprises asequence derived from a TCR V gene. In some embodiments, the sequencederived from the TCR V gene comprises at least 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100,150, 200, or more nucleotides of the TCR V gene adjacent to a sequenceencoding a CDR3 in a rearranged gene. In some embodiments, the firstconnector sequence or the second connector sequence comprises apre-determined sequence. In some embodiments, the first connectorsequence or the second connector sequence comprises a sequencecomplementary to a TCR V gene sequence. In some embodiments, thecomposition further comprises a second plurality of nucleic acidmolecules, each nucleic acid molecule of the second plurality of nucleicacid molecules comprising a sequence derived from a TCR V gene. In someembodiments, a first nucleic acid molecule of the second pluralitycomprises a first anti-connector sequence, which first anti-connectorsequence is complementary to the first connector sequence. In someembodiments, a second nucleic acid molecule of the second pluralitycomprises a second anti-connector sequence, which second anti-connectorsequence is complementary to the second connector sequence. In someembodiments, the first anti-connector sequence of the first nucleic acidmolecule of the second plurality is linked to the first connectorsequence of the first nucleic acid molecule of the first plurality. Insome embodiments, the second anti-connector sequence of the secondnucleic acid molecule of the second plurality is linked to the secondconnector sequence of the second nucleic acid molecule of the firstplurality.

In another aspect, the present disclosure provides a compositioncomprising a plurality of nucleic acid molecules, each comprising asequence encoding at least ten amino acids (e.g., in some cases,encoding at least about 10, 15, 20, 25, 30, 35, 40, 45, 50, 100, 200, ormore amino acids) of a T-cell receptor (TCR) chain, wherein theplurality of nucleic acid molecules comprises a first nucleic acidmolecule having a first connector sequence and a second nucleic acidmolecule having a second connector sequence, wherein the first connectorsequence is different from the second connector sequence, wherein thefirst connector sequence or the second connector sequence encodes aportion of a TCR chain and wherein the first connector sequence or thesecond connector sequence is in frame with the sequence encoding atleast ten (e.g., in some cases, encoding at least about 10, 15, 20, 25,30, 35, 40, 45, 50, 100, 200, or more amino acids) amino acids of a TCRchain.

In some embodiments, the first connector sequence or the secondconnector sequence comprises at least 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,32, 33, 34, 35, 36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100, 150, 200,or more contiguous nucleotides of a TCR chain gene and is in frame withthe sequence encoding at least ten amino acids of a TCR chain. In someembodiments, the first connector sequence and the second connectorsequence encodes at least two contiguous amino acids of a TCR chain. Insome embodiments, the TCR chain of the portion of the TCR chain and theTCR chain encoded by the sequence encoding at least ten amino acids isthe same. In some embodiments, each nucleic acid molecule of theplurality of nucleic acid molecules comprises a sequence derived from aTCR V gene. In some embodiments, each nucleic acid molecule of theplurality of nucleic acid molecules encodes a CDR3 or portion thereof ofthe TCR chain. In some embodiments, each nucleic acid molecule of theplurality of nucleic acid molecules further comprises a J region of theTCR chain. In some embodiments, each nucleic acid molecule of theplurality of nucleic acid molecules encodes a first CDR3 or portionthereof of a first TCR chain and a second CDR3 or portion thereof of asecond TCR chain. In some embodiments, each nucleic acid molecule of theplurality of nucleic acid molecules further comprises a first J regionof a first TCR chain and a second J region of a second TCR chain. Insome embodiments, a sequence encoding the first CDR3 or portion thereofand a sequence encoding the second CDR3 or portion thereof are separatedby at most 100, 90, 80, 70, 60, 50, 40, 30, 20, 10, or 5 nucleotides. Insome embodiments, the first connector sequence or the second connectorsequence comprises a sequence derived from a TCR V gene. In someembodiments, the first connector sequence or the second connectorsequence comprises a pre-determined sequence. In some embodiments, thefirst connector sequence comprises at least one nucleotide that isdifferent from a nucleotide of the second connector sequence. In someembodiments, the first connector sequence encodes a same amino acidsequence as the second connector sequence. In some embodiments, thefirst connector sequence encodes a different amino acid sequence fromthe second connector sequence.

In another aspect, the present disclosure provides a method forgenerating a plurality of nucleic acid molecules, each nucleic acidmolecule of the plurality encoding a T-cell receptor (TCR) chain orregion thereof, the method comprising: contacting a first plurality ofnucleic acid molecules and a second plurality of nucleic acid moleculesto generate a third plurality of nucleic acid molecules comprising atleast two (e.g., at least about 5, 10, 20, 50, 100, 200, 300, 400, 500,1,000, 1,500, 2,000, 2,500, 3,000, 3,500, 4,000, 5,000, 6,000, 7,000,8,000, 9,000, 10,000, 12,000, 15,000, 20,000, 100,000, 1,000,000,10,000,000, or more) different nucleic acid molecules, wherein each ofthe at least two different nucleic acid molecules has a differentsequence encoding a different TCR chain or region thereof, and whereinthe at least two different nucleic acid molecules are generated in asame compartment.

In some embodiments, each nucleic acid molecule of the first pluralityof nucleic acid molecules comprises a sequence encoding a CDR3 of theTCR chain. In some embodiments, each nucleic acid molecule of the firstplurality of nucleic acid molecules comprises a J region of the TCRchain. In some embodiments, each nucleic acid molecule of the secondplurality of nucleic acid molecules comprises a sequence derived from aTCR V gene of the TCR chain. In some embodiments, the TCR V gene is ahuman TCR V gene. In some embodiments, the TCR V gene is a humanTRAV1-1, TRAV1-2, TRAV2, TRAV3, TRAV4, TRAV5, TRAV6, TRAV7, TRAV8-1,TRAV8-2, TRAV8-3, TRAV8-4, TRAV8-6, TRAV9-1, TRAV9-2, TRAV10, TRAV12-1,TRAV12-2, TRAV12-3, TRAV13-1, TRAV13-2, TRAV14, TRAV16, TRAV17, TRAV18,TRAV19, TRAV20, TRAV21, TRAV22, TRAV23, TRAV24, TRAV25, TRAV26-1,TRAV26-2, TRAV27, TRAV29, TRAV30, TRAV34, TRAV35, TRAV36, TRAV38-1,TRAV38-2, TRAV39, TRAV40, or TRAV41. In some embodiments, the TCR V geneis a human TRBV2, TRBV3-1, TRBV4-1, TRBV4-2, TRBV4-3, TRBV5-1, TRBV5-4,TRBV5-5, TRBV5-6, TRBV5-8, TRBV6-1, TRBV6-2, TRBV6-3, TRBV6-4, TRBV6-5,TRBV6-6, TRBV6-8, TRBV6-9, TRBV7-2, TRBV7-3, TRBV7-4, TRBV7-6, TRBV7-7,TRBV7-8, TRBV7-9, TRBV9, TRBV10-1, TRBV10-2, TRBV10-3, TRBV11-1,TRBV11-2, TRBV11-3, TRBV12-3, TRBV12-4, TRBV12-5, TRBV13, TRBV14,TRBV15, TRBV16, TRBV18, TRBV19, TRBV20-1, TRBV24-1, TRBV25-1, TRBV27,TRBV28, TRBV29-1, or TRBV30. In some embodiments, the sequence derivedfrom the TCR V gene comprises a sequence encoding FR1, CDR1, FR2, CDR2,and FR3. In some embodiments, the sequence derived from the TCR V genecomprises a sequence encoding L-PART1, L-PART2, FR1, CDR1, FR2, CDR2,and FR3. In some embodiments, the TCR chain is a TCR alpha chain, a TCRbeta chain, a TCR gamma chain, or a TCR delta chain. In someembodiments, each nucleic acid molecule of the first plurality ofnucleic acid molecules further comprises an additional sequence encodingan additional CDR3 of an additional TCR chain. In some embodiments, eachnucleic acid molecule of the first plurality of nucleic acid moleculescomprises an additional J region of the additional TCR chain. In someembodiments, the TCR chain and the additional TCR chain are a cognatepair of TCR chains. In some embodiments, a nucleic acid molecule of theplurality of nucleic acid molecules encodes a different TCR or portionthereof. In some embodiments, a given nucleic acid molecule of the firstplurality of nucleic acid molecules comprises a connector sequence,which connector sequence is usable for linking the given nucleic acidmolecule of the first plurality of nucleic acid molecules to a givennucleic acid molecule of the second plurality of nucleic acid molecules.In some embodiments, the given nucleic acid molecule of the firstplurality of nucleic acid molecules and the given nucleic acid moleculeof the second plurality of nucleic acid molecules encodes a functionalTCR chain or portion thereof. In some embodiments, the given nucleicacid molecule of the second plurality of nucleic acid moleculescomprises an anti-connector sequence, which anti-connector sequence iscomplementary to the connector sequence of the given nucleic acidmolecule of the first plurality of nucleic acid molecules. In someembodiments, the method further comprises linking the given nucleic acidmolecule of the first plurality of nucleic acid molecules and the givennucleic acid molecule of the second plurality of nucleic acid molecules.In some embodiments, linking comprises hybridizing the given nucleicacid molecule of the first plurality of nucleic acid molecules and thegiven nucleic acid molecule of the second plurality of nucleic acidmolecules. In some embodiments, hybridizing comprises hybridizing theconnector sequence of the given nucleic acid molecule of the firstplurality of nucleic acid molecules with the anti-connector sequence ofthe given nucleic acid molecule of the second plurality of nucleic acidmolecules. In some embodiments, the method further comprises (i)extending a free 3′ end of the given nucleic acid molecule of the secondplurality of nucleic acid molecules using the given nucleic acidmolecule of the first plurality of nucleic acid molecules as a template,and/or (ii) extending a free 3′ end of the nucleic acid molecule of thefirst plurality of nucleic acid molecules using the given nucleic acidmolecule of the second plurality of nucleic acid molecules as atemplate, to generate a nucleic acid molecule of the third plurality ofnucleic acid molecules. In some embodiments, the method furthercomprises ligating the given nucleic acid molecule of the firstplurality of nucleic acid molecules and the given nucleic acid moleculeof the second plurality of nucleic acid molecules. In some embodiments,the method further comprises contacting the nucleic acid molecule of thethird plurality of nucleic acid molecules with a restriction enzyme togenerate a sticky end. In some embodiments, the method further comprisescontacting the nucleic acid molecule of the third plurality of nucleicacid molecules with an additional nucleic acid molecule. In someembodiments, the additional nucleic acid molecule encodes a constantregion or a portion thereof of a TCR chain. In some embodiments, themethod further comprises ligating the nucleic acid molecule of the thirdplurality of nucleic acid molecules and the additional nucleic acidmolecule. In some embodiments, at least five (e.g., in some cases, atleast about 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200,300, 400, 500, 1,000, 2,000, 3,000, 4,000, 5,000, 10,000, 20,000,30,000, 40,000, or more) different nucleic acid molecules of the thirdplurality of nucleic acid molecules are generated in the samecompartment. In some embodiments, at least ten different nucleic acidmolecules of the third plurality of nucleic acid molecules are generatedin the same compartment. In some embodiments, the same compartment is awell, a tube, or a droplet.

In another aspect, the present disclosure provides a method forgenerating a plurality of nucleic acid molecules, comprising: (a)providing a first plurality of nucleic acid molecules, wherein a nucleicacid molecule of the first plurality of nucleic acid molecules comprisesa sequence encoding a first CDR3 of a first T-cell receptor (TCR) chainand a second CDR3 of a second TCR chain, wherein the first CDR3 and thesecond CDR3 are from a cognate pair of TCR chains; (b) providing asecond plurality of nucleic acid molecules, wherein a nucleic acidmolecule of the second plurality of nucleic acid molecules comprises asequence derived from a TCR V gene; and (c) contacting the firstplurality of nucleic acid molecules and the second plurality of nucleicacid molecules, wherein the nucleic acid molecule of the first pluralityof nucleic acid molecules links with the nucleic acid molecule of thesecond plurality of nucleic acid molecules to form a linear nucleic acidmolecule comprising the sequence encoding the first CDR3 and the secondCDR3 and the sequence derived from the TCR V gene, wherein the sequenceencoding the first CDR3 and the second CDR3 and the TCR V gene arederived from the cognate pair of TCR chains. In some embodiments, thefirst plurality of nucleic acid molecules comprises at least about 5,10, 20, 50, 100, 200, 300, 400, 500, 1,000, 1,500, 2,000, 2,500, 3,000,3,500, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 12,000, 15,000,20,000, 100,000, 1,000,000, 10,000,000, or more different sequences. Insome embodiments, the second plurality of nucleic acid moleculescomprises at least about 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,52, 53, 54, 55, 56, 57, 58, 59, 60 61, 62, 63, 64, 65, 66, 67, 68, 69,70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 or more different TCR Vgenes.

In another aspect, the present disclosure provides a method forgenerating a plurality of nucleic acid molecules, comprising: (a)providing a first plurality of nucleic acid molecules, wherein a nucleicacid molecule of the first plurality of nucleic acid molecules comprises(i) a synthetic sequence encoding a first CDR3 of a first T-cellreceptor (TCR) chain and a second CDR3 of a second TCR chain and (ii) asynthetic sequence encoding a third CDR3 of a third T-cell receptor(TCR) chain and a fourth CDR3 of a fourth TCR chain, wherein the firstCDR3 and the second CDR3 are from a first cognate pair of TCR chains andwherein the third CDR3 and the fourth CDR3 are from a second cognatepair of TCR chains; (b) providing a second plurality of nucleic acidmolecules, wherein a nucleic acid molecule of the second plurality ofnucleic acid molecules comprises a sequence derived from a TCR V gene;and (c) contacting the first plurality of nucleic acid molecules and thesecond plurality of nucleic acid molecules, wherein the nucleic acidmolecule of the first plurality of nucleic acid molecules links with thenucleic acid molecule of the second plurality of nucleic acid moleculesto form a nucleic acid molecule comprising the sequence encoding thefirst CDR3 and the second CDR3 and the sequence derived from the TCR Vgene, wherein the sequence encoding the first CDR3 and the second CDR3and the TCR V gene are derived from the cognate pair of TCR chains. Insome embodiments, the first plurality of nucleic acid moleculescomprises at least about 2, 5, 10, 20, 50, 100, 200, 300, 400, 500,1,000, 1,500, 2,000, 2,500, 3,000, 3,500, 4,000, 5,000, 6,000, 7,000,8,000, 9,000, 10,000, 12,000, 15,000, 20,000, 100,000, 1,000,000,10,000,000, or more different sequences. In some embodiments, the secondplurality of nucleic acid molecules comprises at least about 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 6061, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,79, 80 or more different TCR V genes.

In another aspect, the present disclosure provides a method ofidentifying a sequence of a natively paired T-cell receptor (TCR) in atissue sample from a subject, comprising: (a) identifying one or morepaired sequences of one or more natively paired TCRs in a samplecontaining a plurality of peripheral T cells obtained from the subject,wherein each of the one or more paired sequences comprises a CDR3sequence; and (b) identifying a tissue CDR3 sequence of a TCR chain of aTCR in the tissue sample for which the other TCR chain to which it isnatively paired is unknown, wherein the tissue CDR3 sequence matches aCDR3 sequence of at least one paired sequence of the one or more pairedsequences of the one or more natively paired TCRs, thereby identifyingthe at least one paired sequence as the sequence of the natively pairedTCR in the tissue sample. In some embodiments, identifying in (a)comprises sequencing the one or more natively paired TCRs in the samplecontaining the plurality of peripheral T cells. In some embodiments, thesequencing comprises single cell sequencing. In some embodiments, thesingle cell sequencing comprises partitioning the plurality ofperipheral T cells into a plurality of compartments, each compartmentcomprising an individual peripheral T cell of the plurality ofperipheral T cells. In some embodiments, the tissue sample is not abodily fluid sample. In some embodiments, the tissue sample is a solidtumor sample. In some embodiments, the tissue sample is a fixed orfrozen sample. In some embodiments, the sample containing the pluralityof peripheral T cells is a peripheral blood mononuclear cell (PBMC)sample. In some embodiments, the method further comprises, prior to (a),obtaining a blood sample from the subject. In some embodiments, themethod further comprises, prior to (a), isolating peripheral bloodmononuclear cells from the blood sample. In some embodiments, the tissuesample comprises a tumor-infiltrating T cell.

In another aspect, the present disclosure provides a method ofidentifying a target-reactive T-cell receptor (TCR), comprising: (a)providing a cell comprising the TCR identified using the methodsdescribed herein; and (b) contacting the cell with a target antigenpresented by an antigen-presenting cell (APC), wherein the cell binds tothe target antigen presented by the APC via the TCR, thereby identifyingthe TCR as the target-reactive TCR. In some embodiments, the targetantigen is a tumor antigen (e.g., tumor-associated antigens ortumor-specific antigens). In some embodiments, the method furthercomprises delivering a sequence encoding the target-reactive TCR into ahost cell. In some embodiments, the method further comprisesadministering the host cell into the subject. In some embodiments, thehost cell is a T cell. In some embodiments, the T cell is an autologousT cell. In some embodiments, the T cell is an allogeneic T cell. In someembodiments, the cell is a reporter cell line, which reporter cell linecomprises a reporter gene that is expressed upon the cell binding to thetarget antigen presented by the APC.

Additional aspects and advantages of the present disclosure will becomereadily apparent to those skilled in this art from the followingdetailed description, wherein only illustrative embodiments of thepresent disclosure are shown and described. As will be realized, thepresent disclosure is capable of other and different embodiments, andits several details are capable of modifications in various obviousrespects, all without departing from the disclosure. Accordingly, thedrawings and description are to be regarded as illustrative in nature,and not as restrictive.

INCORPORATION BY REFERENCE

All publications, patents, and patent applications mentioned in thisspecification are herein incorporated by reference to the same extent asif each individual publication, patent, or patent application wasspecifically and individually indicated to be incorporated by reference.To the extent publications and patents or patent applicationsincorporated by reference contradict the disclosure contained in thespecification, the specification is intended to supersede and/or takeprecedence over any such contradictory material.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth with particularity inthe appended claims. A better understanding of the features andadvantages of the present invention will be obtained by reference to thefollowing detailed description that sets forth illustrative embodiments,in which the principles of the invention are utilized, and theaccompanying drawings (also “Figure”, “Fig.”, and “FIGURE” herein) ofwhich:

FIGS. 1A-1C depict an example scheme of generating a nucleic acidconstruct encoding a T-cell receptor.

FIG. 2A depicts an example simulation result using the methods describedherein.

FIG. 2B depicts an example simulation result using the methods describedherein.

FIG. 3A depicts an example simulation result using the methods describedherein.

FIG. 3B depicts an example simulation result using the methods describedherein.

FIG. 4A depicts a schematic of germline genomic DNA of a TCR V gene.

FIG. 4B depicts a schematic of rearranged genomic DNA of a TCR V-J gene.

FIG. 4C depicts a schematic of rearranged genomic DNA of a TCR V-D-Jgene.

FIG. 5 depicts a scheme of potential challenge associated with linking aCDR3-J polynucleotide to the correct V gene germline polynucleotide. Thedashed arrows depict linking can happen between the CDR3-Jpolynucleotide and the incorrect V gene germline polynucleotide.

FIG. 6 depicts a scheme of linking a CDR3-J polynucleotide (the gray boxconnected to the white box) to the designated, pre-synthesized V genegermline polynucleotide (the black box connected to the gray box pointedby the thin arrow), by overlapping primer extension. The top thick arrow(603) depicts hybridization between the connector sequence on thepre-synthesized V gene germline polynucleotide (601) and the connectorsequence on the CDR3-J polynucleotide (602). The bottom thick arrow(604) depicts primer extension. 601 may be referred to as a connectorsequence and 602 may be referred to as an anti-connector sequence (orvice versa).

FIG. 7 depicts linking a CDR3-J polynucleotide and the designated V genegermline polynucleotide using arbitrary connector (701) andanti-connector (702) sequences.

FIG. 8 depicts a general principle of TCR gene self-assembly. 801: apre-synthesized V gene germline polynucleotide. 802: a polynucleotidecomprising a CDR3-J sequence (e.g., a CDR3-J polynucleotide). 803: anucleic acid sequence comprising a V gene germline polynucleotidesequence and a CDR3-J sequence. X is the number of polynucleotides eachbeing a portion of a different V gene germline polynucleotide. Y is thenumber of CDR3-J polynucleotides. Y may be much larger than X. The arrowindicates a bulk reaction where each CDR3-J polynucleotide is linked tothe designated, pre-synthesized V gene germline polynucleotide.

FIG. 9A depicts an example workflow of using blood sample to identifytumor-infiltrating TCRs in the tumor sample.

FIG. 9B depicts an example application of TCRs identified using themethod shown in FIG. 9A.

FIG. 9C depicts an example application of TCRs identified using themethod shown in FIG. 9A.

FIG. 10A depicts an example simulation result using the methodsdescribed herein.

FIG. 10B depicts an example simulation result using the methodsdescribed herein.

FIG. 11A depicts an example simulation result using the methodsdescribed herein.

FIG. 11B depicts an example simulation result using the methodsdescribed herein.

FIG. 12 depicts an example next generation sequencing data assessing thegene assembly methods described herein.

FIG. 13 depicts an example next generation sequencing data assessing thegene assembly methods described herein.

FIG. 14 depicts an example next generation sequencing data assessing thegene assembly methods described herein.

FIG. 15 depicts an example next generation sequencing data assessing thegene assembly methods described herein.

FIG. 16 depicts an example next generation sequencing data assessing thegene assembly methods described herein.

FIG. 17 depicts an example next generation sequencing data assessing thegene assembly methods described herein.

FIG. 18 depicts an example next generation sequencing data assessing thegene assembly methods described herein.

DETAILED DESCRIPTION OF THE INVENTION

In this disclosure, the use of the singular includes the plural unlessspecifically stated otherwise. Also, the use of “or” means “and/or”unless stated otherwise. Similarly, “comprise,” “comprises,”“comprising” “include,” “includes,” and “including” are not intended tobe limiting.

The term “about” or “approximately” means within an acceptable errorrange for the particular value as determined by one of ordinary skill inthe art, which will depend in part on how the value is measured ordetermined, i.e., the limitations of the measurement system. Forexample, “about” can mean within 1 or more than 1 standard deviation,per the practice in the art. Alternatively, “about” can mean a range ofup to 20%, up to 10%, up to 5%, or up to 1% of a given value.Alternatively, particularly with respect to biological systems orprocesses, the term can mean within an order of magnitude, e.g., within5-fold, or within 2-fold, of a value. Where particular values aredescribed in the application and claims, unless otherwise stated theterm “about” meaning within an acceptable error range for the particularvalue should be assumed.

Whenever the term “at least,” “greater than,” or “greater than or equalto” precedes the first numerical value in a series of two or morenumerical values, the term “at least,” “greater than” or “greater thanor equal to” applies to each of the numerical values in that series ofnumerical values. For example, greater than or equal to 1, 2, or 3 isequivalent to greater than or equal to 1, greater than or equal to 2, orgreater than or equal to 3.

Whenever the term “no more than,” “less than,” or “less than or equalto” precedes the first numerical value in a series of two or morenumerical values, the term “no more than,” “less than,” or “less than orequal to” applies to each of the numerical values in that series ofnumerical values. For example, less than or equal to 3, 2, or 1 isequivalent to less than or equal to 3, less than or equal to 2, or lessthan or equal to 1.

The terms “polynucleotide”, “nucleic acid” and “oligonucleotide” areused interchangeably in the present disclosure. They can refer to apolymeric form of nucleotides of various length. They may comprisedeoxyribonucleotides and/or ribonucleotides, or analogs thereof. Apolynucleotide may include one or more nucleotides selected fromadenosine (A), cytosine (C), guanine (G), thymine (T) and uracil (U), orvariants thereof. A nucleotide can include a nucleoside and at least 1,2, 3, 4, 5, 6, 7, 8, 9, 10, or more phosphate (PO₃) groups. A nucleotidecan include a nucleobase, a five-carbon sugar (either ribose ordeoxyribose), and one or more phosphate groups. A polynucleotide mayhave any three-dimensional structure and may perform various functions.A polynucleotide can have various configurations, such as linear,circular, stem-loop, and branched. The following are non-limitingexamples of polynucleotides: coding or non-coding regions of a gene orgene fragment, loci (locus) defined from linkage analysis, exons,introns, messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA(rRNA), short interfering RNA (siRNA), short-hairpin RNA (shRNA),micro-RNA (miRNA), circular RNA, ribozymes, cDNA, recombinantpolynucleotides, branched polynucleotides, plasmids, vectors, isolatedDNA of any sequence, isolated RNA of any sequence, nucleic acid probes,and primers. A polynucleotide may comprise one or more modifiednucleotides, such as methylated nucleotides and nucleotide analogs. Ifpresent, modifications to the nucleotide structure may be impartedbefore or after assembly of the polymer. The sequence of nucleotides maybe interrupted by non-nucleotide components. A polynucleotide may befurther modified after polymerization, such as by conjugation with alabeling component. Polynucleotides may include one or more nucleotidevariants, including nonstandard nucleotide(s), non-naturalnucleotide(s), nucleotide analog(s) and/or modified nucleotides.

The term “sequence,” as used herein, refers to the order of nucleotidesin a nucleic acid molecule, or the order of amino acid residues of apeptide. A nucleic acid sequence can be a deoxyribonucleic acid (DNA)sequence or ribonucleic acid (RNA) sequence; can be linear, circular orbranched; and can be either single-stranded or double-stranded. Asequence can be mutated such that it is different from a referencesequence (e.g., wildtype sequence). A sequence can be of any length, forexample, between 2 and 1,000,000 or more amino acids or nucleotides inlength (or any integer value there between or there above), e.g.,between about 100 and about 10,000 nucleotides or between about 200 andabout 500 amino acids or nucleotides. In some cases, a given nucleicacid sequence can encompass the sequence information of the givennucleic acid sequence and a reverse complement sequence of the givennucleic acid sequence. In some cases, a DNA sequence can encompass thesequence information of the corresponding RNA sequence that istranscribed from the DNA. The sequence can be alphabeticalrepresentation of a polynucleotide or polypeptide molecule. The sequencecan be a piece of information that can be used by a computer processor.In some cases, the nucleic acid sequence may be used to refer to thephysical nucleic acid molecule itself.

The term “blunt end,” as used herein, refers to an end of adouble-stranded nucleic acid molecule wherein substantially all of thenucleotides in the end of one strand of the nucleic acid molecule arebase paired with opposing nucleotides in the other strand of the samenucleic acid molecule. A nucleic acid molecule is not blunt ended if ithas an end that includes a single-stranded portion having at least onenucleotide in length, referred to herein as an “overhang” or “stickyend.”

The term “TCR V gene,” as used herein, refers to a genomic nucleic acidsequence of a T-cell receptor variable (V) gene, in germlineconfiguration, that comprises the sequence encoding the first part ofthe leader peptide (e.g., L-PART1 as defined in IMGT), an intron (e.g.,V-INTRON as defined in IMGT) and an exon (e.g., V-EXON as defined inIMGT), with a 5′UTR and a 3′UTR (including recombination signalsequence). The recombination signal sequence can comprise a heptamer(e.g., V-HEPTAMER as defined in IMGT) and a nonamer (e.g., V-NONAMER asdefined by IMGT), separated by a spacer element (e.g., V-SPACER asdefined by IMGT). V-EXON encompasses the sequence encoding the secondpart of the leader peptide (L-PART2) and V-REGION. Examples of TCR Vgene include TCR alpha variable (TRAV) gene, TCR beta variable (TRBV)gene, TCR gamma variable (TRGV) gene, and TCR delta variable (TRDV)gene. A nucleic acid described herein can comprise a sequence derivedfrom the TCR V gene. By “derived from,” it means a sequence having asequence identity of at least about 40%, at least about 50%, at leastabout 60%, at least about 70%, at least about 80%, at least about 90%,at least about 95%, or 100% with a reference sequence. A sequencederived from a TCR V gene can be a full length sequence of the genomicnucleic acid sequence of a TCR V gene as described above. A sequencederived from a TCR V gene can be a portion of the TCR V gene comprisingat least 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, ormore nucleotides of the TCR V gene. A sequence derived from a TCR V genecan be a codon-optimized (or codon-diversified) nucleic acid sequence. Acodon-optimized sequence of a given nucleic acid sequence refers to amodified nucleic acid sequence whose protein-coding region encodes thesame amino acid sequence as the protein-coding region of the givennucleic acid. The modified nucleic acid sequence may have a sequencedifferent from the given nucleic acid sequence or can be derived fromthe given nucleic acid. Codon optimization may be implemented to removerestriction site, to remove unwanted secondary structure in thepolynucleotide sequence, to promote correct linking of a CDR3-Jpolynucleotide and the designated pre-synthesized portion of a TCR Vgene, or for other purposes. Codon optimization or codon diversificationcan be achieved by altering one or more nucleotides of a given nucleicacid sequence. For example, codon optimization or codon diversificationcan be achieved by computational methods. Codon optimization and codondiversification may be used interchangeably in the present disclosure.

The term “V-REGION,” as used herein, refers to coding region of a TCR Vgene (includes 1 or 2 nucleotides before the V-HEPTAMER, if present) ingermline genomic DNA or cDNA, or variable (V) region usually trimmed in3′ by the V-(D)-J rearrangement in rearranged genomic DNA or cDNA.

The term “D-REGION,” as used herein, refers to coding region of a TCR Dgene (includes 1 or 2 nucleotide(s) after the 5′ D-HEPTAMER and/orbefore the 3′ D-HEPTAMER, if present) in germline genomic DNA or cDNA,or diversity (D) region usually trimmed in 5′ and/or 3′ by the D-J orV-D-J rearrangement in partially-rearranged or in rearranged genomic DNAor in cDNA.

The term “J-REGION,” as used herein, refers to coding region of a TCR Jgene (includes 1 or 2 nucleotide(s) after J-HEPTAMER, if present) ingermline genomic DNA or cDNA, or joining (J) region usually trimmed in5′ by the V-(D)-J rearrangement in rearranged genomic DNA or cDNA.

The term “V-J-REGION,” as used herein, refers to coding region of a TCRchain that comprises V-REGION and J-REGION, in rearranged genomic DNA orcDNA.

The term “V-D-J-REGION,” as used herein, refers to coding region of aTCR chain that comprises V-REGION, D-REGION, and J-REGION, in rearrangedgenomic DNA or cDNA.

The terms “link” or “connect” are used interchangeably in the presentdisclosure. They refer to physically linking two or more nucleic acidmolecules. The two or more nucleic acid molecules may be linked suchthat the two or more nucleic acid molecules form a continuous nucleicacid molecule. The two or more nucleic acid molecules can be covalentlylinked or non-covalently linked. Linking may be accomplished in avariety of manners, including formation of hydrogen bonds, ionic andcovalent bonds, or van der Wals forces.

Percent (%) sequence identity with respect to a reference nucleic acidsequence (or peptide sequence) is the percentage of nucleotides (oramino acid residues in case of peptide sequence) in a candidate sequencethat are identical with the nucleotides (or amino acid residues) in thereference nucleic acid sequence (or peptide sequence), after aligningthe sequences and introducing gaps, if necessary, to achieve the maximumpercent sequence identity, and not considering any conservativesubstitutions as part of the sequence identity. Alignment for purposesof determining percent sequence identity can be achieved in various waysthat are within the skill in the art, for instance, using publiclyavailable computer software such as BLAST, BLAST-2, CLUSTALW, ALIGN orMegalign (DNASTAR) software. Those skilled in the art can determineappropriate parameters for aligning sequences, including any algorithmsneeded to achieve maximal alignment over the full length of thesequences being compared.

The term “substantially the same” and its grammatical equivalents asapplied to nucleic acid or amino acid sequences mean that a nucleic acidor amino acid sequence comprises a sequence that has at least 90%sequence identity or more, at least 95%, at least 98% or at least 99%,compared to a reference sequence using the programs described above,e.g., BLAST, using standard parameters. For example, the BLASTN program(for nucleotide sequences) uses as defaults a word length (W) of 11, anexpectation (E) of 10, M=5, N=−4, and a comparison of both strands. Foramino acid sequences, the BLASTP program uses as defaults a word length(W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (seeHenikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1992)).

Overview

High-throughput, paired sequencing can be used to sequence T-cellreceptor (TCR). For example, with the development of single-celltechnologies, individual T cells can be partitioned in to insulatedcompartments where TCR alpha and beta chain mRNAs from the same T cellcan be attached to the same, unique barcode. Some of these systems havebeen made commercially available (e.g., by 10X Genomics). Pairedsequence information that records the T-cell receptor alpha variable(TRAV) gene identity, CDR3 alpha sequence, T-cell receptor alpha joining(TRAJ) gene identity, T-cell receptor beta variable (TRBV) geneidentity, CDR3 beta sequence, and T-cell receptor beta joining (TRBJ)gene identity can allow reconstruction of the full-length, expressibleTCR. However, the technologies to synthesize such TCR sequences in theform of DNA or RNA that can be introduced into cells for functionalstudies or screenings can be low-throughput. The current disclosureprovides multiple methods and compositions that can allowultrahigh-throughput construction of polynucleotides encoding TCRsequences (e.g., in some cases, paired, full-length, expressible TCRsequences).

T-Cell Receptor (TCR)

The TCR can be used to confer the ability of T cells to recognizeantigens associated with various cancers or infectious organisms. TheTCR is made up of two chains, e.g., an alpha (α) chain and a beta (β)chain or a gamma (γ) and a delta (δ) chain. The proteins which make upthese chains are encoded by DNA, which employs a unique mechanism forgenerating the tremendous diversity of the TCR. This multi-subunitimmune recognition receptor associates with the CD3 complex and bindspeptides presented by the MHC class I and II proteins on the surface ofantigen-presenting cells (APCs). Binding of a TCR to the antigenicpeptide on the APC can be a central event in T-cell activation, whichoccurs at an immunological synapse at the point of contact between the Tcell and the APC.

The TCR may recognize the T cell epitope in the context of an MHC classI molecule. MHC class I proteins can be expressed in all nucleated cellsof higher vertebrates. The MHC class I molecule is a heterodimercomposed of a 46-kDa heavy chain which is non-covalently associated withthe 12-kDa light chain β-2 microglobulin. In humans, there are severalMHC alleles, such as, for example, HLA-A2, HLA-A1, HLA-A3, HLA-A24,HLA-A28, HLA-A31, HLA-A33, HLA-A34, HLA-B7, HLA-B45 and HLA-Cw8. In someembodiments, the MHC class I allele is an HLA-A2 allele, which in somepopulations is expressed by approximately 50% of the population. In someembodiments, the HLA-A2 allele can be an HLA-A*0201, *0202, *0203,*0206, or *0207 gene product. In some cases, there can be differences inthe frequency of subtypes between different populations. For example, insome embodiments, more than 95% of the HLA-A2 positive Caucasianpopulation is HLA-A*0201, whereas in the Chinese population thefrequency has been reported to be approximately 23% HLA-A*0201, 45%HLA-A*0207, 8% HLA-A*0206 and 23% HLA-A*0203.

In some embodiments, the TCR may recognize the T cell epitope in thecontext of an MHC class II molecule. MHC class II proteins can beexpressed in a subset of APCs. In humans, there are several MHC class IIalleles, such as, for example, DR1, DR3, DR4, DR7, DR52, DQ1, DQ2, DQ4,DQ8 and DPI. In some embodiments, the MHC class II allele is anHLA-DRB1*0101, an HLA-DRB*0301, an HLA-DRB*0701, an HLA-DRB*0401 or anHLA-DQB1*0201 gene product.

The TCR chain can comprise a variable domain (or variable region) and aconstant domain (or constant region). The variable domain can be aV-DOMAIN as defined by IMGT unique numbering system. The variable domaincan correspond to V-J-REGION or V-D-J-REGION of a TCR chain. Theconstant domain can be C-DOMAIN as defined by IMGT unique numberingsystem. In some cases, the constant domain can be a portion of theconstant region. For example, a full-length constant region can comprisethe constant domain (an extracellular region), a connecting region, atransmembrane region, and a cytoplasmic region.

The variable domain of TCRα or TCRδ chain can be encoded by a number ofvariable (V) and joining (J) gene segments in the germline, whilevariable domain of TCRβ or TCRγ chain is additionally encoded bydiversity (D) gene segments. Each gene segment can be flanked byrecombination signal sequences. The recombination signals can comprise aheptamer and a nonamer, separated by a spacer element. The spacerelement can be 12 or 23 bp long. During V(D)J recombination, one randomallele of each gene segment is recombined with the others to form afunctional variable domain. Recombination of the variable domain with aconstant (C) gene segment can result in a functional TCR chaintranscript. Additionally, random nucleotides may be added and/or deletedat the junction sites between the gene segments. This process can leadto strong combinatorial (depending on which gene regions will recombine)and junctional diversity (depending on which and how many nucleotideswill be added/deleted), resulting in a large and highly variable TCRrepertoire, which can ensure the identification of a plethora ofantigens. Additional diversity can be achieved by the pairing (alsoreferred to as “assembly”) of α and β or γ and δ chains to form afunctional TCR. By recombination, random insertion, deletion andsubstitution, the small set of genes that encode the T cell receptor hasthe potential to create between 10¹⁵ and 10²⁰ TCR clonotypes. As usedherein, a “clonotype” refers to a population of immune cells that carryan identical immunoreceptor. For example, a clonotype refers to apopulation of T cells that carry an identical TCR, or a population ofB-cells that carry an identical BCR (or antibody). “Diversity” in thecontext of immunoreceptor diversity refers to the number ofimmunoreceptor (e.g., TCR, BCR and antibody) clonotypes in a population.As used herein, a “cognate pair combination” refers to the nativecombination of the two chains (e.g., TCRα and TCRβ, or TCRγ and TCRδ) ofa TCR from a T cell. The same cognate pair combination of the two chainscan result in the same TCR. For example, the T cells having the sameclonotype have the same cognate pair combinations of TCRα and TCRβchains. The higher diversity in clonotype may indicate higher diversityin cognate pair combination.

Each TCR chain can contain three hypervariable loops in its structure,termed complementarity determining regions (CDR1-3). CDR1 and CDR2 canbe encoded by V genes and may be required for interaction of the TCRwith the MHC complex. CDR3, however, is encoded in part by the (1)junctional region between the V and J genes (in the case of TCRα orTCRγ), or (2) the junctional region between the V and D genes and thejunctional region between the D and J genes (in the case of TCRβ orTCRδ), and therefore can be highly variable. CDR3 may be the region ofthe TCR in direct contact with the peptide antigen. CDR3 can be used asthe region of interest to determine T cell clonotypes. The sum of allTCRs by the T cells of one individual is termed the TCR repertoire orTCR profile. The TCR repertoire can change with the onset andprogression of diseases. Therefore, determining the immune repertoirestatus under different disease conditions, such as cancer, autoimmune,inflammatory and infectious diseases may be useful for disease diagnosisand prognosis.

TCR should be understood to encompass full-length TCRs as well asantigen-binding portions or antigen-binding fragments (also calledMHC-peptide binding fragments) thereof. In some embodiments, the TCR isan intact or full-length TCR. In some embodiments, the TCR is anantigen-binding portion that is less than a full-length TCR but thatbinds to a specific antigenic peptide bound to an MHC molecule, e.g., anMHC-peptide complex. In some cases, an antigen-binding portion orfragment of a TCR can contain only a portion of the structural domainsof a full-length or intact TCR, but yet is able to bind the epitope(e.g., MHC-peptide complex) to which the full TCR binds. In some cases,an antigen-binding portion or fragment of a TCR contains the variabledomains of a TCR, such as variable α chain and variable β chain of aTCR, sufficient to form a binding site for binding to a specificMHC-peptide complex, such as generally where each chain contains threecomplementarity determining regions. Polypeptides or proteins having abinding domain which is an antigen-binding domain or is homologous to anantigen-binding domain are included.

A TCR molecule can be formed by an alpha chain (α chain or TCRα chain,encoded by TRA gene/sequence) and a beta chain (β chain or TCRβ chain,encoded by TRB gene/sequence), or a gamma chain (γ chain or TCRγ chain,encoded by TRG gene/sequence) and a delta chain (δ chain or TCRδ chain,encoded by TRD gene/sequence). These immunoreceptor chains can havevariable domains (e.g., encoded by the rearranged VDJ or VJ regions).Parts of the variable domains can be hypervariable. The hypervariableregions can include complementarity determining regions (CDRs), forexample, CDR1, CDR2 and CDR3. In some cases, within one T cell, only onefunctional α chain sequence and one functional β chain sequence may beexpressed. In some cases, within one T cell, only one functional γ chainsequence and one functional δ chain sequence may be expressed.

Chip-Based Oligonucleotide Synthesis: Opportunities and Challenges

Although chip-based high-throughput oligonucleotide synthesistechnologies may have been progressing, to the point that hundreds ofthousands or even millions of oligonucleotides with arbitrary sequencescan be synthesized at once, the lengths of the oligonucleotidessynthesized in this manner may be limited to about 200 to 300 baseslong. In contrast, a full-length TCR construct can be nearly twokilobases long. At first glance, chip-based synthesis may seeminsufficient to solve the TCR gene synthesis problem. However,examination of the structure of TCR can reveal opportunities. First, theconstant regions of TCR alpha chain and beta chain (e.g., TRAC and TRBC)can be constant. Thus, the polynucleotide sequences encoding constantregions of TCR chains can be appended to the rest of the TCR sequences.Second, unlike BCR/antibody sequences, TCRs may not undergo somatichypermutation, which means the sequences outside of CDR3 regions can beof germline origin. Therefore, polynucleotides, each comprising asequence derived from a TCR V gene or a portion thereof can bepre-synthesized. The sequence derived from a TCR V gene can be a portionof the TCR V gene. The sequence derived from a TCR V gene can be acodon-optimized sequence or comprise one or more modified nucleotides.For example, the sequence derived from the TCR V gene comprising codingsequences for L-PART1 (first part of the leader peptide), L-PART2(second part of the leader peptide), FR1, CDR1, FR2, CDR2 and FR3,referred to as L-V-REGION, can be pre-synthesized. For another example,the sequence derived from the TCR V gene comprising coding sequences forFR1, CDR1, FR2, CDR2 and FR3, referred to as V-REGION, can bepre-synthesized. The nucleic acid sequence segment of L-PART1, L-PART2,FR1, CDR1, FR2, CDR2, or FR3, can be defined according to the IMGTunique numbering system (http://www.imgt.org). In some cases, thesequence derived from the TCR V gene can comprise a sequence startingfrom the sequence encoding L-PART1 and ending at the codon encoding thesecond conserved cysteine (e.g., 2nd-CYS, as defined by IMGT,corresponds to codon for the conserved cysteine at position 104 of theV-DOMAIN). Since there are about 80 or more TCR V genes (e.g., TRAV andTRBV genes) in human genome, synthesis of such “V gene germlinepolynucleotide library” (as shown in FIG. 8, 801 and bracket X) can befeasible. In some cases, a subset of TCR V genes of a species (e.g., ahuman or a mouse) are synthesized to generate the “V gene germlinepolynucleotide library.” All identified or a subset of TCR V genes maybe synthesized. For example, at least about 2, 3, 4, 5, 6, 7, 8, 9, 10,11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60 61, 62, 63, 64,65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80 or moreTCR V genes of the species can be synthesized to generate the library.In some cases, all identified TCR V genes of a species are synthesizedto generate the library. The TCR V gene can be TRAV, TRBV, TRGV, orTRDV. As described herein, in some cases, a “V gene germlinepolynucleotide” refers to a portion of the genomic or codon-optimizedpolynucleotide of a TCR V gene. The sequence derived from the TCR V genecan be the V gene germline polynucleotide. The sequence between FR3 andconstant region (e.g., CDR3 plus the remaining of the J region, referredto as “CDR3-J” herein) can be at least about 10, 20, 30, 40, 50, 60, 70,80, 90 or more nucleotides long, or in some cases, can be up to about 90nucleotides long. The CDR3-J sequence of the alpha chain and beta chainof a TCR can be at least about 50, 60, 70, 80, 90, 100, 120, 150, 180 ormore nucleotides long. The CDR3-J sequence of the alpha chain and betachain of a TCR (in some cases, in total up to about 180 nucleotideslong) can be included into an oligonucleotide (referred to as a “pairedCDR3-J oligo”, a “paired CDR3-J oligonucleotide” or a “paired CDR3-Jpolynucleotide”, which can be used interchangeably) that can be amenableto chip-based synthesis (as shown in FIG. 8, 802 and bracket Yencompassing 802). In some cases, the paired CDR3-J polynucleotide cancomprise a CDR3-J sequence of a TCR gamma chain and a CDR3-Jpolynucleotide of a TCR delta chain. As used herein, the terms “CDR3-Jpolynucleotide,” “CDR3-J oligonucleotide,” and “CDR3-J oligo” (which canbe used interchangeably) refer to a polynucleotide sequence comprisingone or more CDR3-J sequences. A CDR3-J polynucleotide may be a pairedCDR3-J polynucleotide (e.g., comprising CDR3-J sequences from a pairedTCR chains). The CDR3-J polynucleotide (e.g., non-paired) may containonly the CDR3-J sequence from one of the paired TCR chains. For example,the CDR3-J polynucleotide may contain only the CDR3-J sequence from aTCR alpha chain, a TCR beta chain, a TCR gamma chain, or a TCR deltachain. The remaining challenge can be to convert such paired CDR3-Joligonucleotide into expressible TCR construct in high throughput (e.g.,constructing >1,000 TCRs in one batch). Using the methods describedherein, the paired CDR3-J oligonucleotide can be linked to theircorresponding V gene germline polynucleotides in a bulk reaction (e.g.,FIG. 8, 803). In some cases, the CDR3-J polynucleotide pool (e.g.,paired or non-paired) can comprise at least about 2, 5, 10, 20, 50, 100,200, 300, 400, 500, 1,000, 1,500, 2,000, 2,500, 3,000, 3,500, 4,000,5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 12,000, 15,000, 20,000,100,000, 1,000,000, 10,000,000, or more different sequences. The V genegermline polynucleotide library can comprise at least about 2, 3, 4, 5,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 6061, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,79, 80 or more TCR V genes. When using the methods described herein, aplurality of at least 2, 5, 10, 20, 50, 100, 200, 300, 400, 500, 1,000,1,500, 2,000, 2,500, 3,000, 3,500, 4,000, 5,000, 6,000, 7,000, 8,000,9,000, 10,000, 12,000, 15,000, 20,000, 100,000, 1,000,000, 10,000,000,or more different sequences encoding different natively paired TCRs canbe generated. The natively paired TCRs can be generated in bulk in asingle compartment.

Examples of germline or rearranged gene construct of a nucleic acidmolecule comprising a TCR V gene sequence are shown in FIGS. 4A-4C. Forexample, FIG. 4A shows germline genomic DNA of a TCR V gene, comprisingL-PART1, V-INTRON, V-EXON, and recombination signal sequences(V-HEPTAMER, V-SPACER, and V-NONAMER). The two conserved cysteines arealso shown in FIG. 4A. After V-(D)-J recombination, an example constructof the rearranged genomic DNA is shown in FIG. 4B or FIG. 4C. The CDR3can be encoded by (i) the junction (or junctional region) betweenV-REGION and J-REGION or (ii) the junction between V-REGION and D-REGIONand the junction between D-REGION and J-REGION.

The TCR V genes can be very diverse. In human, more than 40 functional Vgenes for TRA have been identified, including, for example, TRAV1-1,TRAV1-2, TRAV2, TRAV3, TRAV4, TRAV5, TRAV6, TRAV7, TRAV8-1, TRAV8-2,TRAV8-3, TRAV8-4, TRAV8-6, TRAV9-1, TRAV9-2, TRAV10, TRAV12-1, TRAV12-2,TRAV12-3, TRAV13-1, TRAV13-2, TRAV14, TRAV16, TRAV17, TRAV18, TRAV19,TRAV20, TRAV21, TRAV22, TRAV23, TRAV24, TRAV25, TRAV26-1, TRAV26-2,TRAV27, TRAV29, TRAV30, TRAV34, TRAV35, TRAV36, TRAV38-1, TRAV38-2,TRAV39, TRAV40, and TRAV41.

Among these V genes, some of them can be classified into a same subgroupand they are indicated by a same subgroup number immediately following“TRAV” but a different number following “−” sign. For example, TRAV1-1and TRAV1-2 are from a same subgroup. As used herein, a “group” is a setof genes that share the same “gene type” (e.g., V, D, J or C type) andparticipate potentially in the synthesis of a polypeptide of the same“chain type”. By extension, a group includes the related pseudogenes andorphans. A “subgroup” means a set of genes that belong to the samegroup, in a given species, and that share at least 75% identity at thenucleotide level (in the germline configuration for V, D, and J).

In human, more than 40 functional V genes for TRB have been identified,including, for example, TRBV2, TRBV3-1, TRBV4-1, TRBV4-2, TRBV4-3,TRBV5-1, TRBV5-4, TRBV5-5, TRBV5-6, TRBV5-8, TRBV6-1, TRBV6-2, TRBV6-3,TRBV6-4, TRBV6-5, TRBV6-6, TRBV6-8, TRBV6-9, TRBV7-2, TRBV7-3, TRBV7-4,TRBV7-6, TRBV7-7, TRBV7-8, TRBV7-9, TRBV9, TRBV10-1, TRBV10-2, TRBV10-3,TRBV11-1, TRBV11-2, TRBV11-3, TRBV12-3, TRBV12-4, TRBV12-5, TRBV13,TRBV14, TRBV15, TRBV16, TRBV18, TRBV19, TRBV20-1, TRBV24-1, TRBV25-1,TRBV27, TRBV28, TRBV29-1, and TRBV30. V genes for other species, e.g.,mouse, can be found in IMGT database.

Diversify Connector Sequences

Connecting a V gene germline polynucleotide and a CDR3-J polynucleotidecan be achieved by molecular biology techniques such as ligation andoverlapping primer extension (FIG. 6). However, to fully utilize thepower of chip-based oligonucleotide synthesis, one may connect thousandsof or more CDR3-J oligonucleotides with their corresponding V genegermline polynucleotides in a bulk reaction (as shown by the arrow ofFIG. 8, 803). The major challenge in doing this can be that theconnector region between the V gene germline polynucleotide (FIG. 6,601) and the CDR3-J (FIG. 6, 602) may be the conserved FR3 region.Therefore, in a bulk reaction, it can be difficult to control which Vgene germline polynucleotide is connected to which CDR3-J (as depictedin FIG. 5 where the solid arrow depicts linking to the correct V genegermline polynucleotide and the dashed arrows depict linking to theincorrect V gene germline polynucleotide). For example, a TCR sequencemay be formed by TRBV4-1 connected to a particular CDR3-J beta sequence.In the bulk reaction, the V gene germline polynucleotides for bothTRBV4-1 and TRBV4-2 can be present, and the FR3 regions for these TRBVgenes can be highly similar. Therefore, the CDR3-J oligonucleotide forthis TCR may be incorrectly connected to the TRBV4-2 germlinepolynucleotide. To alleviate this problem, codon diversification can beused to create dissimilarities among different FR3 sequences. Forexample, the connector sequences can be codon-diversified such that theycan have different nucleic acid sequences, even though they may encodean identical amino acid sequence. Codon diversification can be achievedby computational methods such as the method shown in Example 2. Aplurality of nucleic acid sequences can be generated by assigning acodon to an amino acid randomly or according to an arbitrary rule, whereeach nucleic acid sequence can encode the same amino acid sequence.Next, the plurality of nucleic acid sequences can be evaluatedcomputationally to assign a score according to an arbitrary rule. Thearbitrary rule may consider factors such as restriction site, propensityto hybridize with an unwanted sequence, propensity to hybridize with agiven sequence, or unwanted secondary structure in the sequence. Next,based on the score, a nucleic acid sequence can be selected from theplurality of nucleic acid sequences as a codon-diversified connectorsequence. The codon-diversified connector sequence can be used toachieve correct linking of a CDR3-J polynucleotide and the designatedpre-synthesized portion of a TCR V gene. In a “V gene germlinepolynucleotide library” comprising some or all the known TCR V genes,for example, each different TCR V gene can have a different connectorsequence, which can be used to correctly connect to the correspondingCDR3-J oligonucleotide to form a TCR chain according to a referencesequence. The reference sequence can be generated by sequencing cognatepairs of TCR chains. However, in some cases, it may be unclear to whatextent the connector sequences can be diversified and to what extent theconnection between the V gene germline polynucleotides and CDR3-Joligonucleotides can be correct in a bulk reaction. As shown in Example2, it may be possible to diversify the FR3 regions of human TCR V genesso that the ‘mis-connection probability’ for any given CDR3-J sequenceis practically undetectable. The algorithm set out in Example 2 can beused to generate ‘codon-diversified V gene germline polynucleotides’ andtheir corresponding CDR3-J sequences.

Once a diverse set of connector sequences are found, many methods usingmolecular biology techniques (e.g., ligation, restriction digestion,circularization) can be used to convert a CDR3-J oligonucleotide pool toa full-length, expressible TCR pool. Example 1 provides an exampleworkflow. The methods provided herein can also be used to generate apool of individual TCR chains (e.g., not paired chains) in a bulkreaction. For example, to generate a pool of TCR alpha chains, eachindividual CDR3-J oligonucleotide may comprise CDR3 and J region fromTCR alpha chain but may not comprise another CDR3 and J region from aTCR beta chain, and then the CDR3-J oligonucleotide can be used to linkwith corresponding TRAV gene to form the TCR alpha chain.

Methods for Constructing Nucleic Acid Molecules Encoding TCRs

The nucleic acid molecules encoding TCRs described herein can beconstructed from two or more nucleic acid fragments. In someembodiments, the two or more nucleic acid fragments can be referred toas a first nucleic acid molecule, a second nucleic acid molecule, athird nucleic acid molecule, a fourth nucleic acid molecule, etc. Whenconstructing the nucleic acid molecules, standard molecular biologytechniques, including but not limited to hybridization, extension,ligation, and enzymatic digestion/cleavage, may be used.

The nucleic acid fragment described herein can encode a TCR chain orportion thereof. For example, the portion of the TCR chain encoded bythe nucleic acid fragment can comprise greater than or equal to about10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 110, 120, 130,140, 150, 200, 250, or more amino acids. The nucleic acid fragment cancomprise a sequence encoding a functional TCR chain. The functional TCRchain may or may not be a full length TCR chain. The functional TCRchain may comprise one or more mutations or modifications. In somecases, a functional TCR chain, when expressed in a host cell, canincorporate into a TCR complex (e.g., a complex having TCRα, TCRβ, CD3γ,CD3δ, CD3ε, and ζ chains). In some cases, a functional TCR can bind toits target ligand. In some cases, a functional TCR, when expressed in ahost cell, can incorporate into the cell membrane. In some cases, afunctional TCR can be expressed in a host cell.

The nucleic acid fragment used to construct nucleic acid moleculeencoding a TCR or portion thereof can comprise a sequence encoding aCDR3.

The nucleic acid fragment used to construct nucleic acid moleculeencoding a TCR or portion thereof can comprise a sequence encoding afirst CDR3 of a first TCR chain and a second CDR3 of a second TCR chain,wherein the first CDR3 and the second CDR3 are derived from a cognatepair of TCR chains. In some embodiments, the sequence encoding the firstCDR3 and the sequence encoding the second CDR3 are separated by at mostabout 100, 90, 80, 70, 60, 50, 40, 30, 20, 10, or 5 nucleotides.

The nucleic acid fragment used to construct nucleic acid moleculeencoding a TCR or portion thereof can comprise a TCR V gene sequence orportion thereof. The nucleic acid fragment used to construct nucleicacid molecule encoding a TCR or portion thereof can comprise a sequencederived from a TCR V gene sequence. The sequence derived from a TCR Vgene can comprise a V-REGION nucleic acid sequence. The sequence derivedfrom a TCR V gene can comprise a sequence encoding FR1, CDR1, FR2, CDR2and/or FR3 nucleic acid sequence. The sequence derived from a TCR V genecan comprise a sequence encoding a leader peptide. The sequence derivedfrom a TCR V gene can comprise a sequence encoding L-PART1, L-PART2,FR1, CDR1, FR2, CDR2 and/or FR3 nucleic acid sequence. The sequencederived from a TCR V gene can comprise or can be a portion of the TCR Vgene. The portion of the TCR V gene can be at least 10 nucleotides inlength. For example, the portion of the TCR V gene may be greater thanor equal to about 10, 20, 30, 50, 60, 70, 80, 90, 100, 200, 300, 400,500, 600, 700, 800, 900, 1000, or more nucleotides in length. Thesequence derived from a TCR V gene may comprise one or more modifiednucleotides. The sequence derived from a TCR V gene may becodon-optimized (or codon-diversified) such that it has a differentsequence than the TCR V gene or portion thereof but it can encode a sameamino acid sequence. The sequence derived from a TCR V gene may notcomprise a sequence encoding a portion of a CDR3. The sequence derivedfrom a TCR V gene may not comprise a sequence of a junctional region ofa rearranged gene.

The nucleic acid fragment used to construct nucleic acid moleculeencoding a TCR or portion thereof can comprise a sequence encoding aconstant domain or portion thereof. The nucleic acid fragment used toconstruct nucleic acid molecule encoding a TCR or portion thereof cancomprise a sequence encoding a constant region or portion thereof. Insome cases, the constant domain or constant region is a TCR alphaconstant domain or constant region, a TCR beta constant domain orconstant region, a TCR gamma constant domain or constant region, or aTCR delta constant domain or constant region. In some cases, theconstant region comprises a constant domain. In some cases, the constantregion further comprises a transmembrane region, a connecting region, acytoplasmic region, or a combination thereof.

The nucleic acid fragment used to construct nucleic acid moleculeencoding a TCR or a portion thereof can comprise a connector sequence.The connector sequence can be used to link one nucleic acid molecule toanother nucleic acid molecule. The connector sequence of one nucleicacid molecule can hybridize (e.g., form base pair or base pairs) with ananti-connector sequence of another nucleic acid molecule. Theanti-connector sequence can be complementary (e.g., fully orsubstantially complementary) with the connector sequence. Theanti-connector sequence can be hybridizable with the connector sequenceunder certain conditions (e.g., temperature, buffer condition, pH,etc.). The anti-connector sequence can be a reverse complement sequence(or complementary sequence) of the connector sequence. When theconnector sequence hybridizes with the anti-connector sequence, the basepair(s) formed can be at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90,100, or more base pairs. The base pairs formed between the connectorsequence and the anti-connector sequence can be contiguous ornon-contiguous. For example, in the cases where non-contiguous basepairs are formed, there may be unpaired region or regions separatingpaired regions. If a first nucleic acid molecule comprises a connectorsequence, then a complementary sequence of the connector sequence on asecond nucleic acid molecule can be referred to as an anti-connectorsequence. The connector sequence (or anti-connector sequence) can be ofvarious lengths. For example, the connector sequence (or anti-connectorsequence) can be greater than or equal to about 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75,80, 85, 90, 100, or more nucleotides in length. The connector sequence(or anti-connector sequence) can be less than or equal to about 300,250, 200, 150, 100, 90, 85, 80, 75, 70, 65, 60, 55, 50, 45, 40, 35, 30,25, 20, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3 or 2 nucleotides inlength. The connector sequence (or anti-connector sequence) can be at 5′end or 3′ end of a nucleic acid molecule. The connector sequence (oranti-connector sequence) can also be an internal sequence of a nucleicacid molecule. For example, the connector sequence can be an internalconnector sequence and can be exposed at 5′ end or 3′ end by cutting aninternal sequence (e.g., a sequence adjacent to the internal connectorsequence) of the nucleic acid molecule. An example of the internalconnector sequence is provided in Example 1, the inter-chain connector(ICC). In some cases, a connector sequence and an anti-connectorsequence are used to link a nucleic acid molecule encoding a CDR3 or aportion thereof of a TCR chain with another nucleic acid moleculecomprising a TCR V gene or a portion thereof. In some cases, a connectorsequence and an anti-connector sequence are used to link a nucleic acidmolecule comprising a J region of a TCR with another nucleic acidmolecule comprising a TCR V gene or a portion thereof. In some cases, aconnector sequence and an anti-connector sequence are used to link anucleic acid molecule comprising a sequence encoding a CDR3 or a portionthereof and a J region of a TCR with another nucleic acid moleculecomprising a TCR V gene or a portion. In some cases, a connectorsequence and an anti-connector sequence are used to link a nucleic acidmolecule comprising a sequence encoding a CDR3 or a portion thereof, a Jregion, and a TCR V gene or a portion thereof with another nucleic acidmolecule encoding a constant domain or a portion thereof of a TCR.

The connector sequence (or the anti-connector sequence) can be asequence encoding a portion of a TCR V gene (e.g., the portion of theTCR V gene adjacent to the sequence encoding a CDR3 in the rearrangedgene). And in such cases, the connector sequence and one or more otherconnector sequences in a pool of connector sequences may encode a sameamino acid sequence (e.g., the conserved portion of the TCR V geneadjacent to the CDR3). When the connector sequence encodes a conservedportion of a TCR V gene, the connector sequence can be codon-diversifiedsuch that the connector sequence can be used to link a nucleic acidmolecule to another nucleic acid molecule specifically, resulting in aconstructed nucleic acid molecule encoding a cognate pair of a TCR. Insome embodiments, the connector sequence (or anti-connector sequence)comprises at least 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,36, 37, 38, 39, 40, 50, 60, 70, 80, 90, 100, 150, 200, or morenucleotides of the TCR V gene adjacent to a sequence encoding a CDR3 ina rearranged gene. Because of the specificity of the connector sequenceand the anti-connector sequence, a pool of nucleic acid molecules havingdifferent sequences which encode different TCRs can be constructed in abulk reaction (e.g., in a same compartment). The connector sequence canprescribe which TCR V gene the sequence encoding a CDR3 should be linkedto according to a reference sequence (e.g., a native sequence of a TCRchain determined by sequencing). The connector sequence (or theanti-connector sequence) can be an arbitrary (e.g., pre-determined)sequence which may not encode a portion of a TCR V gene. And in suchcases, the arbitrary sequence can be removed after linking two nucleicacid fragments together.

FIG. 7 depicts an example to use arbitrary connector (701) andanti-connector (702) sequence to link a CDR3-J polynucleotide to adesignated V gene germline polynucleotide (thin arrow). Here each V genegermline polynucleotide has as partially double-stranded structure. Thetop strand, with its 3′ end to its right in this figure, has asingle-stranded region at its 3′ end. The connector and theanti-connector sequences may be single stranded and may hybridize toeach other. The connector and anti-connector sequence only serves thepurpose of specific hybridization and may not be related to TCRwhatsoever, hence arbitrary. After the hybridization between theconnector and the anti-connector, the 3′ end of the top strand of the Vgene germline polynucleotide may hybridize to the CDR3-J polynucleotideand may be extended by a DNA polymerase. The number of nucleotides onthe 3′ end of the top strand of the V gene germline polynucleotide thatare hybridized to the CDR3-J polynucleotide may be 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, or 15, or up to 20.

The nucleic acid fragment used to construct nucleic acid moleculeencoding a TCR or a portion thereof can comprise a self-cleavingpeptide. The self-cleaving peptide can be a 2A peptide, an inteinpeptide, or a hedgehog peptide. Examples of 2A peptide include, but arenot limited to, P2A (e.g., sequence: ATNFSLLKQAGDVEENPGP (SEQ ID NO:284)), E2A (e.g., sequence QCTNYALLKLAGDVESNPGP (SEQ ID NO: 285)), F2A(e.g., sequence VKQTLNFDLLKLAGDVESNPGP (SEQ ID NO: 286)), and T2A (e.g.,sequence EGRGSLLTCGDVEENPGP (SEQ ID NO: 287)) peptide.

The nucleic acid fragment used to construct nucleic acid moleculeencoding a TCR or a portion thereof can comprise a restriction enzymerecognition site. For example, the restriction enzyme recognition sitecan be a recognition site for Type IIS restriction enzyme. Examples ofType-IIS restriction enzymes which can be useful in the presentdisclosure include, but are not limited to, Earl, MnlI, PleI, AlwI,BbsI, BbvI, BcoDI, BsaI, BseRI, BsmAI, BsmBI, BspMI, Esp3I, HgaI, SapI,SfaNI, BbvI, BsmFI, BsrDI, BtsI, FokI, BseRI, HphI, MlyI and MboII. Insome cases, two or more different restriction enzymes can be used duringnucleic acid construction process. In some cases, a restriction enzymethat create a 4-bp 5′ overhang (for example, BbsI, BbvI, BcoDI, BsaI,BsmBI, FokI, etc.) can be used. In some cases, a restriction enzyme thatcreates a blunt end or 3′ overhang (for example, BseRI, BsrDI, BtsI,MlyI, etc.) can be used.

A nucleic acid fragment used to construct nucleic acid molecule encodinga TCR or a portion thereof can be circularized. For example, the nucleicacid fragment can be circularized by joining two ends of the nucleicacid fragment by ligation. The ligation can be blunt end ligation. Theligation can be performed after creating sticky ends using 5′-to-3′exonuclease (e.g, Gibson Assembly), 3′-to-5′ exonuclease (e.g., sequenceand ligase independent cloning or SLIC), or USER enzyme mix (e.g., USERfriendly DNA recombination or USERec). Additional examples ofcircularization methods include, but are not limited to, circularpolymerase extension cloning (CPEC) and seamless ligation cloningextract (SLiCE) assembly. Alternatively, these two ends can be joined byoverlapping PCR. A variety of ligases can be used for ligation, forexample, including but not limited to, T4 DNA ligase, T4 RNA ligase, E.coli DNA ligase.

The nucleic acid fragment used to construct the nucleic acid moleculeencoding a TCR chain or portion thereof can be synthesized chemically.For example, the nucleic acid fragment can be pre-synthesized bychip-based synthesis. In some cases, the nucleic acid fragmentsynthesized can be equal to or greater than about 10, 20, 30, 40, 50,60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, or more nucleotidesin length. In some cases, the nucleic acid fragment synthesized by canbe equal to or less than about 500, 450, 400, 350, 300, 250, 200, 150,100, 90, 80, 70, 60, 50, 40, 30, 20, or 10 nucleotides in length.

The two nucleic acid sequences encoding two peptide chains of a TCR canbe constructed in several orientations, for example, head-to-head,head-to-tail, and tail-to-tail. As described herein, “head” refers to“5′ end” of a sense nucleic acid strand and “tail” refers to “3′ end” ofa sense nucleic acid strand. In some cases, the orientation ishead-to-tail, the order of the paired nucleic acid sequences encoding aTCR (e.g., TRA followed by TRB, or TRB followed by TRA) can becontrolled.

Any nucleic acid molecule described herein can be a double-strandednucleic acid molecule or single-stranded nucleic acid molecule. In somecases, a nucleic acid molecule may comprise a double-stranded region anda single-stranded region. For example, the nucleic acid molecule havinga connector sequence or anti-connector sequence may be a double-strandednucleic acid molecule having the connector sequence or anti-connectorsequence region as a single-stranded region (e.g., an overhang or stickyend). The overhang can be at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ormore nucleotides long. The overhang can be at 5′ end or 3′ end of anucleic acid molecule.

Any nucleic acid molecule describe herein can comprise one or moremodified nucleotides. Examples of modified nucleotides include, but arenot limited to diaminopurine, 5-fluorouracil, 5-bromouracil,5-chlorouracil, 5-iodouracil, hypoxanthine, xantine, 4-acetylcytosine,5-(carboxyhydroxylmethyl)uracil,5-carboxymethylaminomethyl-2-thiouridine,5-carboxymethylaminomethyluracil, dihydrouracil,beta-D-galactosylqueosine, inosine, N6-isopentenyladenine,1-methylguanine, 1-methylinosine, 2,2-dimethylguanine, 2-methyladenine,2-methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine,7-methylguanine, 5-methylaminomethyluracil,5-methoxyaminomethyl-2-thiouracil, beta-D-mannosylqueosine,5′-methoxycarboxymethyluracil, 5-methoxyuracil,2-methylthio-D46-isopentenyladenine, uracil-5-oxyacetic acid (v),wybutoxosine, pseudouracil, queosine, 2-thiocytosine,5-methyl-2-thiouracil, 2-thiouracil, 4-thiouracil, 5-methyluracil,uracil-5-oxyacetic acid methylester, uracil-5-oxyacetic acid(v),5-methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w,2,6-diaminopurine and the like. In some cases, nucleotides may includemodifications in their phosphate moieties, including modifications to atriphosphate moiety. Non-limiting examples of such modifications includephosphate chains of greater length (e.g., a phosphate chain having, 4,5, 6, 7, 8, 9, 10 or more phosphate moieties) and modifications withthiol moieties (e.g., alpha-thiotriphosphate andbeta-thiotriphosphates). Nucleic acid molecules may also be modified atthe base moiety (e.g., at one or more atoms that typically are availableto form a hydrogen bond with a complementary nucleotide and/or at one ormore atoms that are not typically capable of forming a hydrogen bondwith a complementary nucleotide), sugar moiety or phosphate backbone.Nucleic acid molecules may also contain amine-modified groups, such asamino ally 1-dUTP (aa-dUTP) and aminohexhylacrylamide-dCTP (aha-dCTP) toallow covalent attachment of amine reactive moieties, such asN-hydroxysuccinimide esters (NHS). Alternatives to standard DNA basepairs or RNA base pairs in the oligonucleotides of the presentdisclosure can provide higher density in bits per cubic mm, highersafety (resistant to accidental or purposeful synthesis of naturaltoxins), easier discrimination in photo-programmed polymerases, or lowersecondary structure. Such alternative base pairs can be compatible withnatural and mutant polymerases for de novo and/or amplificationsynthesis.

An example workflow of constructing nucleic acid molecules encoding TCRsis shown in FIGS. 1A-1C. A plurality of cognate pairs of TCRs can bepre-determined using various existing methods (e.g., single cellbarcoding and sequencing) prior to using the methods described herein toconstruct the nucleic acid molecules encoding TCRs. Various sequencingmethods can be used to determine sequences of paired TCR chains, forexample, Sanger sequencing, high-throughput sequencing,sequencing-by-synthesis, single-molecule sequencing,sequencing-by-ligation, RNA-Seq (Illumina), Next generation sequencing,Digital Gene Expression (Helicos), Clonal Single MicroArray (Solexa),shotgun sequencing, Maxim-Gilbert sequencing, or massively-parallelsequencing. The paired sequences from the sequencing library can serveas reference sequences for the cognate pairs of TCR chains such that onecan know which CDR3 is paired with which V gene through specificinteractions between a connector sequence and an anti-connectorsequence. A plurality of nucleic acid molecules encoding different TCRscan be constructed in a bulk using the methods described herein, but theconstruction of one molecule is shown in FIGS. 1A-1C as an example. Afirst nucleic acid molecule comprising a sequence encoding a first CDR3(e.g., CDR3a) and a second CDR3 (e.g., CDR3β) can be contacted with asecond nucleic acid molecule comprising a sequence derived from a firstTCR V gene (e.g., TRAV). The connector sequence (e.g., ConA #*) of thefirst nucleic acid molecule can hybridize with the anti-connectorsequence (ConA #) of the second nucleic acid molecule to link the twonucleic acid molecules. Extension and ligation can be performed togenerate a third nucleic acid molecule comprising the sequence derivedfrom the first TCR V gene and the sequence encoding the first CDR3 andthe second CDR3. Next, a restriction enzyme (e.g., TIISRE1 of FIG. 1A)can be used to generate an overhang (or sticky end) of the third nucleicacid molecule. Next, the third nucleic acid molecule can be contactedwith a fourth nucleic acid molecule comprising a sequence encoding afirst constant region or constant domain (e.g., TRBC). The third nucleicacid molecule can then be ligated to the fourth nucleic acid moleculethrough the overhang to generate a fifth nucleic acid moleculecomprising the sequence derived from the first TCR V gene, the sequenceencoding the first CDR3 and the second CDR3, and the sequence encodingthe first constant region. The fifth nucleic acid molecule can becircularized and cut with a restriction enzyme (e.g., TIISRE3) to exposean internal connector sequence (e.g., ICC). Next, the fifth nucleic acidmolecule can be contacted with a sixth nucleic acid molecule comprisinga sequence derived from a second TCR V gene (e.g., TRBV). The fifthnucleic acid molecule can be ligated to the sixth nucleic acid moleculethrough the interaction between a connector sequence and ananti-connector sequence. Next, the sixth nucleic acid molecule can becut by a restriction enzyme (e.g., TIISRE2) to generate an overhang.Next, the sixth nucleic acid molecule can be contacted with a seventhnucleic acid molecule comprising a sequence encoding a second constantregion or constant domain (e.g., TRAC). The sixth nucleic acid moleculeand the seventh nucleic acid molecule can be ligated to form an eighthnucleic acid molecule comprising all regions encoding paired TCR chains.The eighth nucleic acid molecule can be further constructed into anexpression vector for TCR chain expression in a host cell. It should beunderstood that the nucleic acid fragment comprising the sequencederived from a TCR V gene may be single-stranded and in such case, the3′ end of the connector sequence of the nucleic acid fragment encodingthe CDR3 can be extended upon hybridizing with the anti-connectorsequence.

The methods described herein can be used to generate a pool ofindividual TCR chains, for example, a pool of TCR alpha chains or TCRbeta chains.

The methods for generating a plurality of nucleic acid moleculesdescribed herein can comprise providing a first plurality of nucleicacid molecules (or nucleic acid fragments). A nucleic acid molecule ofthe first plurality of nucleic acid molecules can comprise a sequenceencoding a first CDR3 of a first T-cell receptor (TCR) chain and asecond CDR3 of a second TCR chain. The first CDR3 and the second CDR3can be from a cognate pair of TCR chains. Next, a second plurality ofnucleic acid molecules can be provided. A nucleic acid molecule of thesecond plurality of nucleic acid molecules can comprise a sequencederived from a TCR V gene. The nucleic acid molecule may not comprise asequence encoding a constant domain. Next, the first plurality ofnucleic acid molecules and the second plurality of nucleic acidmolecules can be contacted. The nucleic acid molecule of the firstplurality of nucleic acid molecules can link with the nucleic acidmolecule of the second plurality of nucleic acid molecules to form anucleic acid molecule comprising the sequence encoding the first CDR3and the second CDR3 and the sequence derived from the TCR V gene. Thesequence encoding the first CDR3 and the second CDR3 and the TCR V genecan be derived from the cognate pair of TCR chains.

The method for generating a plurality of nucleic acid molecules, eachnucleic acid molecule of the plurality encoding a T-cell receptor (TCR)chain or region thereof, can comprise contacting a first plurality ofnucleic acid molecules and a second plurality of nucleic acid moleculesto generate a third plurality of nucleic acid molecules comprising atleast two different nucleic acid molecules. Each of the at least twodifferent nucleic acid molecules can have a different sequence encodinga different TCR chain or region thereof. The at least two differentnucleic acid molecules can be generated in a same compartment. In somecases, at least about 5, 10, 20, 50, 100, 200, 300, 400, 500, 1,000,1,500, 2,000, 2,500, 3,000, 3,500, 4,000, 5,000, 6,000, 7,000, 8,000,9,000, 10,000, 12,000, 15,000, 20,000, 100,000, 1,000,000, 10,000,000,or more different sequences encoding different TCRs can be generated inthe same compartment.

The method for generating a plurality of nucleic acid moleculesdescribed herein can comprise providing a first plurality of nucleicacid molecules. A nucleic acid molecule of the first plurality ofnucleic acid molecules can comprise a sequence encoding a first CDR3 ofa first T-cell receptor (TCR) chain and a second CDR3 of a second TCRchain. The first CDR3 and the second CDR3 can be from a cognate pair ofTCR chains. Next, a second plurality of nucleic acid molecules can beprovided. A nucleic acid molecule of the second plurality of nucleicacid molecules can comprise a sequence derived from a TCR V gene. Next,the first plurality of nucleic acid molecules and the second pluralityof nucleic acid molecules can be contacted. The nucleic acid molecule ofthe first plurality of nucleic acid molecules can link with the nucleicacid molecule of the second plurality of nucleic acid molecules to forma linear nucleic acid molecule comprising the sequence encoding thefirst CDR3 and the second CDR3 and the sequence derived from the TCR Vgene. The sequence encoding the first CDR3 and the second CDR3 and theTCR V gene can be derived from the cognate pair of TCR chains.

The method for generating a plurality of nucleic acid molecules cancomprise providing a first plurality of nucleic acid molecules. Anucleic acid molecule of the first plurality of nucleic acid moleculescan comprise (i) a synthetic sequence encoding a first CDR3 of a firstT-cell receptor (TCR) chain and a second CDR3 of a second TCR chain and(ii) a synthetic sequence encoding a third CDR3 of a third T-cellreceptor (TCR) chain and a fourth CDR3 of a fourth TCR chain. The firstCDR3 and the second CDR3 can be from a first cognate pair of TCR chainsand the third CDR3 and the fourth CDR3 can be from a second cognate pairof TCR chains. Next, a second plurality of nucleic acid molecules can beprovided. A nucleic acid molecule of the second plurality of nucleicacid molecules can comprise a sequence derived from a TCR V gene. Next,the first plurality of nucleic acid molecules and the second pluralityof nucleic acid molecules can be contacted. The nucleic acid molecule ofthe first plurality of nucleic acid molecules can link with the nucleicacid molecule of the second plurality of nucleic acid molecules to forma nucleic acid molecule comprising the sequence encoding the first CDR3and the second CDR3 and the sequence derived from the TCR V gene. Thesequence encoding the first CDR3 and the second CDR3 and the TCR V genecan be derived from the cognate pair of TCR chains.

The method for generating a nucleic acid molecule encoding a T-cellreceptor (TCR) chain or portion thereof can comprise providing at leastone nucleic acid molecule comprising a sequence encoding a CDR3 of a TCRchain. Next, a plurality of nucleic acid molecules can be provided. Eachnucleic acid molecule of the plurality can comprise a sequence derivedfrom a TCR V gene. The plurality of nucleic acid molecules can compriseat least two different sequences derived from at least two different TCRV genes. In some cases, the plurality of nucleic acid molecules cancomprise at least 2, 5, 10, 15, 20, 25, 30, 35, 40 or more differentsequences derived from at least 2, 5, 10, 15, 20, 25, 30, 35, 40 or moredifferent TCR V genes. Next, the at least one nucleic acid moleculecomprising a sequence encoding a CDR3 of a TCR chain can be contacted tothe plurality of nucleic acid molecules, each comprising a sequencederived from a TCR V gene, in a same compartment. The at least onenucleic acid molecule comprising a sequence encoding a CDR3 of a TCRchain can be capable of linking to a nucleic acid molecule of theplurality of nucleic acid molecules to generate a third nucleic acidmolecule comprising the sequence encoding the CDR3 and a sequencederived from one of the at least two different TCR V genes, therebygenerating the nucleic acid molecule encoding the TCR chain or portionthereof.

The composition described herein that can be used for the methodsdescribed herein can comprise a first plurality of nucleic acidmolecules. Each nucleic acid molecule of the first plurality of nucleicacid molecules can comprise a sequence encoding a first CDR3 of a firstT-cell receptor (TCR) chain and a second CDR3 of a second TCR chain. Thefirst CDR3 and the second CDR3 can be from a cognate pair of TCR chains.The composition can further comprise a second plurality of nucleic acidmolecules. Each nucleic acid molecule of the second plurality of nucleicacid molecules can comprise a sequence derived from a TCR V gene. Eachnucleic acid molecule of the second plurality of nucleic acid moleculesmay not comprise a sequence encoding the first CDR3 and the second CDR3.In this composition, (i) each nucleic acid molecule of the firstplurality of nucleic acid molecules can comprise a sequence encoding adifferent first CDR3 and/or second CDR3, and/or (ii) each nucleic acidmolecule of the second plurality of nucleic acid molecules comprises asequence derived from a different TCR V gene.

The composition described herein that can be used for the methodsdescribed herein can comprise a plurality of nucleic acid molecules.Each nucleic acid molecule of the plurality of nucleic acid moleculescan comprise a sequence derived from a T-cell receptor (TCR) V gene. Theplurality of nucleic acid molecules can comprise a first nucleic acidmolecule having a first connector sequence and a second nucleic acidmolecule having a second connector sequence. The first connectorsequence can be different from the second connector sequence.

The composition described herein that can be used for the methodsdescribed herein can comprise a plurality of nucleic acid molecules.Each nucleic acid molecule of the plurality of nucleic acid moleculescan encode a CDR3 of a T-cell receptor (TCR) chain. A first nucleic acidmolecule of the plurality can comprise a first connector sequence and asecond nucleic acid molecule of the plurality can comprise a secondconnector sequence. The first connector sequence can be different fromthe second connector sequence.

The composition described herein that can be used for the methodsdescribed herein can comprise a plurality of nucleic acid molecules.Each nucleic acid molecule of the plurality can comprise a sequenceencoding at least ten amino acids of a T-cell receptor (TCR) chain. Afirst nucleic acid molecule of the plurality can comprise a firstconnector sequence and a second nucleic acid molecule of the pluralitycan comprise a second connector sequence. The first connector sequencecan be different from the second connector sequence. The first connectorsequence or the second connector sequence can encode a portion of a TCRchain. The first connector sequence or the second connector sequence canbe in frame with the sequence encoding at least ten amino acids of a TCRchain.

The composition described herein that can be used for the methodsdescribed herein can comprise a plurality of nucleic acid molecules.Each nucleic acid molecule of the plurality of nucleic acid moleculescan comprise a sequence derived from a T-cell receptor (TCR) V gene andmay not comprise a CDR3 sequence. A first nucleic acid molecule of theplurality can comprise a first anti-connector sequence and a secondnucleic acid molecule of the plurality can comprise a secondanti-connector sequence. The first anti-connector sequence can bedifferent from the second anti-connector sequence. The sequence derivedfrom a TCR V gene of the first nucleic acid molecule and the secondnucleic acid molecule can be derived from a different TCR V gene. Thecomposition can further comprise at least one nucleic acid moleculecomprising a sequence encoding a CDR3 of a TCR chain. The at least onenucleic acid molecule can further comprise a first connector sequencecomplementary to the first anti-connector sequence.

The present disclosure provides compositions and methods for theassembly or synthesis of a TCR library comprising a plurality of TCRs.In some cases, it may be useful to isolate or purify a particular TCRsequence (e.g., a TCR-of-interest) from the TCR library for furthercharacterization or manipulation. To do this, a barcode can be includedin the nucleic acid molecules or fragments used to construct thesequence encoding a TCR or portion thereof. In some cases, a nucleicacid fragment comprising a sequence encoding a CDR3 comprises a barcode.In some cases, a nucleic acid fragment comprising a sequence encoding afirst CDR3 of a first TCR chain and a second CDR3 of a second TCR chaincomprises a barcode. For example, a CDR3-J oligo or paired CDR3-J oligocan comprise a barcode. The connector sequence (or in some cases, theanti-connector sequence) can comprise a barcode. The inter-chainconnector (or ICC) of the CDR3-J oligo can comprise a barcode. Thebarcode can be a primer binding site, e.g., a TCR-specificprimer-binding site or DOPBS.

For example, each sequence encoding a unique paired CDR3-J in the pairedCDR3-J oligo pool (e.g., FIG. 1A) can comprise a unique barcode (or aunique DOPBS). The sequences of the DOPBSes can be arbitrarily designed.The sequences of the DOPBSes can be designed to avoid common pitfallssuch as unwanted secondary structures, restriction sites, similaritywith other sequences in the TCR genes, or similarities betweenprimer-binding sites. The barcode (or DOPBS) can be at least about 2, 3,4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40,or more nucleotides long. The DOPBS can be an additional sequenceincluded in each sequence of the paired CDR3-J pool. The DOPBS can be asequence already included in each sequence of the paired CDR3-J pool.For example, a connector sequence or portion thereof can be used as aDOPBS. The sequences listed in Table 3 can be used as DOPBSes. Theproduct of Step (9) of FIG. 1C can be used as the template in a dial-outPCR using a forward primer corresponding to T2A-3, and a reverse primercorresponding to the DOPBS associated with the TCR-of-interest. The PCRproduct can be subject to Steps (10) and (11) of FIG. 1C. The finalproduct can contain primarily the TCR-of-interest.

Expression of TCRs

Using the methods provided herein, a pool of nucleic acid molecules,each encoding a TCR or portion thereof, can be further delivered into ahost cell for expression. The constructed nucleic acid molecule can beinserted into vectors in order to be expressed in a host cell. Theconstructed nucleic acid molecule may be delivered into a recipient cellas a linear or circular nucleic acid strand. In some cases, theconstructed nucleic acid or vector comprising the constructed nucleicacid can be delivered into a recipient cell by electroporation. In somecases, the constructed nucleic acid or vector comprising the constructednucleic acid can be delivered by a carrier such as a cationic polymer.

The vector can be a plasmid, transposon (e.g., Sleeping Beauty, PiggyBac), adenoviral vector, AAV vector, retroviral vector or lentiviralvector. Non-limiting examples of a vector include a plasmid, shuttlevector, phagemide, cosmid, virion, retroviral vector, adenoviral vectoror particle and/or vector commonly used in gene therapy. Non-limitingexamples of suitable plasmid vectors include pUC, pBR322, pET,pBluescript, and variants thereof. Further, a vector can compriseadditional expression control sequences (e.g., enhancer sequences, Kozaksequences, polyadenylation sequences, transcriptional terminationsequences, etc.), selectable marker sequences (e.g., antibioticresistance genes), origins of replication, and the like. A vector mayinclude nucleic acid sequences that permit it to replicate in a hostcell, such as an origin of replication. A vector may also include one ormore selectable marker genes and other genetic elements. A vector can bean expression vector that includes a constructed nucleic acid sequenceencoding a TCR or a portion thereof according to the present disclosureoperably linked to sequences allowing for the expression of the TCR.Additional examples of vectors include but are not limited to viral andnon-viral vectors, such as retroviral vector (including lentiviralvectors), adenoviral vectors including replication competent,replication deficient and gutless forms thereof, adeno-associated virus(AAV) vectors, simian virus 40 (SV-40) vectors, bovine papillomavectors, Epstein-Barr vectors, herpes vectors, vaccinia vectors, Moloneymurine leukemia vectors, Harvey murine sarcoma virus vectors, murinemammary tumor virus vectors, Rous sarcoma virus vectors and nonviralplasmids. Baculovirus vectors can be suitable for expression in insectcells. The non-viral vector can be formulated into a nanoparticle, acationic lipid, a cationic polymer, a metallic nanopolymer, a nanorod, aliposome, a micelle, a microbubble, a cell-penetrating peptide, or aliposphere.

In some embodiments, the vector is a self-amplifying RNA replicon, alsoreferred to as self-replicating (m)RNA, self-replication (m)RNA,self-amplifying (m)RNA, or RNA replicon. The self-amplifying RNAreplicon is an RNA that can replicate itself. In some embodiments, theself-amplifying RNA replicon can replicate itself inside of a cell. Insome embodiments, the self-amplifying RNA replicon encodes an RNApolymerase and a molecule of interest. The RNA polymerase may be aRNA-dependent RNA polymerase (RDRP or RdRp). The self-amplifying RNAreplicon may also encode a protease or an RNA capping enzyme. In someembodiments, the self-amplifying RNA replicon vector is of or derivedfrom the Togaviridae family of viruses known as alphaviruses which caninclude Eastern Equine Encephalitis virus (EEE), Venezuelan EquineEncephalitis virus (VEE), Everglades virus, Mucambo virus, Pixuna virus,Western Equine Encephalitis virus (WEE), Sindbis virus, South AfricanArbovirus No. 86, Semliki Forest virus, Middelburg virus, Chikungunyavirus, Onyong-nyong virus, Ross River virus, Barmah Forest Virus, GetahVirus, Sagiyama virus, Bebaru virus, Mayaro virus, Una virus, Auravirus, Whataroa virus, Babanki virus, Kyzylagach virus, Highlands JVirus, Fort Morgan virus, Ndumu virus, Buggy Creek virus, and any othervirus classified by the International Committee on Taxonomy of Viruses(ICTV) as an alphavirus. In some embodiments, the self-amplifying RNAreplicon is or contains parts from an attenuated form of the alphavirus,such as the VEE TC-83 vaccine strain. In some embodiments, theself-amplifying RNA replicon vector has been engineered or selected invitro, in vivo, ex vivo, or in silica for a specific function (e.g.,prolonged or increased bipartite immunoreceptor expression) in the hostcell, target cell, or organism. For example, a population of host cellsharboring different variants of the self-amplifying RNA replicon can beselected based on the expression level of one or more molecules ofinterested (encoded in the self-amplifying RNA replicon or in the hostgenome) at different time point. In some embodiments, the selected orengineered self-amplifying RNA replicon has been modified to reduce thetype I interferon response, the innate antiviral response, or theadaptive immune response from the host cell or organism which results inthe RNA replicon's protein expression persisting longer or expressing athigher levels in the host cell, target cell, or organism. In someembodiments, this optimized self-amplifying RNA replicon sequence isobtained from an individual cell or population of cells with the desiredphenotypic trait (e.g., higher or more sustained expression of themolecules of interest, or reduced innate antiviral immune responseagainst the vector compared to the wildtype strains or the vaccinestrains). In some embodiments, the cells harboring the desired orselected self-amplifying RNA replicon sequence are obtained from asubject (e.g., a human or an animal) with beneficial responsecharacteristics (e.g., an elite responder or subject in completeremission) after being treated with a therapeutic agent comprising aself-amplifying RNA replicon. In some embodiments, the self-amplifyingRNA replicon can contain one or more sub-genomic sequence(s) to produceone or more sub-genomic polynucleotide(s). In some embodiments, thesub-genomic polynucleotides act as functional mRNA molecules fortranslation by the cellular translation machinery. A sub-genomicpolynucleotide can be produced via the function of a defined sequenceelement (e.g., a sub-genomic promoter or SGP) on the self-amplifying RNAreplicon that directs a polymerase to produce the sub-genomicpolynucleotide from a sub-genomic sequence. In some embodiments, the SGPis recognized by an RNA-dependent RNA polymerase (RDRP or RdRp). In someembodiments, multiple SGP sequences are present on a singleself-amplifying RNA replicon and can be located upstream of sub-genomicsequence encoding for a bipartite immunoreceptor, a constituent of thebipartite immunoreceptor, or an additional agent. In some embodiments,the nucleotide length or composition of the SGP sequence can be modifiedto alter the expression characteristics of the sub-genomicpolynucleotide. In some embodiments, non-identical SGP sequences arelocated on the self-amplifying RNA replicon such that the ratios of thecorresponding sub-genomic polynucleotides are different from instanceswhere the SGP sequences are identical. In some embodiments,non-identical SGP sequences direct the production of a TCR and anadditional agent (e.g., a cytokine) such that they are produced at aratio relative to one another that leads to increased expression of theTCR, increased or faster expansion of the target cell without cytotoxiceffects to the target cell or host, or dampens the innate or adaptiveimmune response against the RNA replicon. In some embodiments, thelocation of the sub-genomic sequences and SGP sequences relative to oneanother and the genomic sequence itself can be used to alter the ratioof sub-genomic polynucleotides relative to one another. In someembodiments, the SGP and sub-genomic sequence encoding the TCR can belocated downstream of an SGP and sub-genomic region encoding theadditional agent such that the expression of the TCR is substantiallyincreased relative to the additional agent. In some embodiments, the RNAreplicon or SGP has been selected or engineered to express an optimalamount of the cytokine such that the cytokine promotes the expansion ofthe T cell or augments the therapeutic effect of the TCR but does notcause severe side effects such as cytokine release syndrome, cytokinestorm, or neurological toxicity.

The expression of the two chains can be driven by two promoters or byone promoter. In some cases, two promoters are used. In some cases, thetwo promoters, along with their respective protein-coding sequences forthe two chains, can be arranged in a head-to-head, a head-to-tail, or atail-to-tail orientation. In some cases, one promoter is used. The twoprotein-coding sequences can be linked in frame such that one promotercan be used to express both chains. And in such cases, the twoprotein-coding sequences can be arranged in a head-to-tail orientationand can be connected with ribosome binding site (e.g., internalribosomal binding site or IRES), protease cleavage site, orself-processing cleavage site (such as a sequence encoding a 2A peptide)to facilitate bicistronic expression. In some cases, the two chains canbe linked with peptide linkers so that the two chains can be expressedas a single-chain polypeptide. Each expressed chain may contain the fullvariable domain sequence including the rearranged V(D)J gene. Eachexpressed chain may contain the full variable domain sequence includingCDR1, CDR2, and CDR3. Each expressed chain may contain the full variabledomain sequence including FR1, CDR1, FR2, CDR2, FR3, and CDR3. In somecases, each expressed chain may further contain a constant domainsequence.

To create expression vectors, additional sequences may be added to theconstructed nucleic acid molecules. These additional sequences includevector backbone (e.g., elements required for the vector's replication intarget cell or in temporary host such as E. coli), promoters, IRES,sequence encoding the self-cleaving peptide, terminators, accessorygenes (such as payloads), as well as partial sequences of theimmunoreceptor polynucleotides (such as part of the sequences encodingthe constant domains).

Protease cleavage sites include, but are not limited to, an enterokinasecleavage site: (Asp)4Lys (SEQ ID NO: 288); a factor Xa cleavage site:Ile-Glu-Gly-Arg (SEQ ID NO: 289); a thrombin cleavage site, e.g.,Leu-Val-Pro-Arg-Gly-Ser (SEQ ID NO: 290); a renin cleavage site, e.g.,His-Pro-Phe-His-Leu-Val-Ile-His (SEQ ID NO: 291); a collagenase cleavagesite, e.g., X-Gly-Pro (where X is any amino acid); a trypsin cleavagesite, e.g., Arg-Lys; a viral protease cleavage site, such as a viral 2Aor 3C protease cleavage site, including, but not limited to, a protease2A cleavage site from a picornavirus, a Hepatitis A virus 3C cleavagesite, human rhinovirus 2A protease cleavage site, a picornavirus 3protease cleavage site; and a caspase protease cleavage site, e.g., DEVD(SEQ ID NO: 292) recognized and cleaved by activated caspase-3, wherecleavage occurs after the second aspartic acid residue. In someembodiments, the present disclosure provides an expression vectorcomprising a protease cleavage site, wherein the protease cleavage sitecomprises a cellular protease cleavage site or a viral protease cleavagesite. In some embodiments, the first protein cleavage site comprises asite recognized by furin; VP4 of IPNV; tobacco etch virus (TEV)protease; 3C protease of rhinovirus; PC5/6 protease; PACE protease,LPC/PC7 protease; enterokinase; Factor Xa protease; thrombin; genenaseI; MMP protease; Nuclear inclusion protein a (N1a) of turnip mosaicpotyvirus; NS2B/NS3 of Dengue type 4 flaviviruses, NS3 protease ofyellow fever virus; ORF V of cauliflower mosaic virus; KEX2 protease;CB2; or 2A. In some embodiments, the protein cleavage site is a viralinternally cleavable signal peptide cleavage site. In some embodiments,the viral internally cleavable signal peptide cleavage site comprises asite from influenza C virus, hepatitis C virus, hantavirus, flavivirus,or rubella virus.

A suitable IRES element to include in the vector of the presentdisclosure can comprise an RNA sequence capable of engaging a eukaryoticribosome. In some embodiments, an IRES element of the present disclosureis at least about 250 base pairs, at least about 350 base pairs, or atleast about 500 base pairs. An IRES element of the present disclosurecan be derived from the DNA of an organism including, but not limitedto, a virus, a mammal, and a Drosophila. In some cases, a viral DNA fromwhich an IRES element is derived includes, but is not limited to,picornavirus complementary DNA (cDNA), encephalomyocarditis virus (EMCV)cDNA and poliovirus cDNA. Examples of mammalian DNA from which an IRESelement is derived includes, but is not limited to, DNA encodingimmunoglobulin heavy chain binding protein (BiP) and DNA encoding basicfibroblast growth factor (bFGF). An example of Drosophila DNA from whichan IRES element is derived includes, but is not limited to, anAntennapedia gene from Drosophila melanogaster. Addition examples ofpoliovirus IRES elements include, for instance, poliovirus IRES,encephalomyocarditis virus IRES, or hepatitis A virus IRES. Examples offlaviviral IRES elements include hepatitis C virus IRES, GB virus BIRES, or a pestivirus IRES, including but not limited to bovine viraldiarrhea virus IRES or classical swine fever virus IRES.

Examples of self-processing cleavage sites include, but are not limitedto, an intein sequence; modified intein; hedgehog sequence; otherhog-family sequence; a 2A sequence, e.g., a 2A sequence derived fromFoot and Mouth Disease Virus (FMDV); and variations thereof for each.

A vector for recombinant immunoglobulin or other protein expression mayinclude any number of promoters, wherein the promoter is constitutive,regulatable or inducible, cell type specific, tissue-specific, orspecies specific. Further examples include tetracycline-responsivepromoters. The vector can be a replicon adapted to the host cell inwhich the recombinantly constructed gene is to be expressed, and it cancomprise a replicon functional in a bacterial cell as well, for example,Escherichia coli. The promoter can be constitutive or inducible, whereinduction is associated with the specific cell type or a specific levelof maturation, for example. Alternatively, a number of viral promoterscan be suitable. Examples of promoters include the β-actin promoter,SV40 early and late promoters, immunoglobulin promoter, humancytomegalovirus promoter, retrovirus promoter, elongation factor 1A(EF-1A) promoter, phosphoglycerate kinase (PGK) promoter, and the Friendspleen focus-forming virus promoter. The promoters may or may not beassociated with enhancers, wherein the enhancers may be naturallyassociated with the particular promoter or associated with a differentpromoter.

Applications

The compositions and methods described herein can have variousapplications. An example application can be to re-construct sequencesencoding natively paired TCRs from sequencing data, e.g., single cellsequencing data. In some applications, one may want to re-constructsequences encoding natively paired TCRs identified fromtumor-infiltrating T cells. In these applications, a fresh tissue sample(e.g., a fresh solid tumor sample) form a subject may be used for singlecell sequencing to obtain sequence information of both TCR chains ofnatively paired TCRs. However, when a tissue sample (e.g., a solidmatter sample that is not a bodily fluid sample) containingtumor-infiltrating cells is a frozen sample or a fixed sample (e.g.,FFPE sample), it may be challenging to separate cells to obtain singlecell suspension. In these cases, a blood sample containing peripheral Tcells from the same subject may be used for single cell sequencing toidentify sequences of natively paired TCRs. Because the blood sample maycontain tumor-infiltrating T cells released from the tissue sample intothe blood stream, the sequences obtained from the blood sample maycontain the sequences from these tumor-infiltrating T cells. Then, thetissue sample from the same subject can be used for bulk sequencing.Although bulk sequencing of the tissue sample may not provide pairedsequences of natively paired TCRs, it can provide CDR3 sequences forindividual TCR chains. The CDR3 sequences obtained in the bulksequencing of the tissue sample (referred to as “tissue CDR3 sequences”herein) can then be used to align with paired sequences obtained in thesingle cell sequencing of the blood sample. If the CDR3 sequences of thepaired sequences match with the tissue CDR3 sequences, the pairedsequences can be identified and used for any down-stream applications.

Single cell sequencing refers to obtaining sequence information fromindividual cells. In single cell sequencing, a population of cells canbe made into single cell suspension and compartmentalized intoindividual partitions. Within each partition, the sequences releasedfrom a single cell can be barcoded and later sequenced. Various singlecell sequencing methods can be used for TCR reconstruction (see DeSimone M, Rossetti G and Pagani M (2018) Single Cell T Cell ReceptorSequencing: Techniques and Future Challenges. Front. Immunol. 9:1638).Bulk sequencing refers to obtaining sequence information from apopulation of cells. In bulk sequencing, nucleic acid molecules can beisolated from a mixture of cells and subjected to sequencing together.

FIG. 9A shows an example workflow of using blood sample to identifytumor-infiltrating TCRs in the tumor sample. First, a blood sample canbe drawn from a patient. Next, a PBMC sample containing peripheral bloodmononuclear cells can be isolated from the blood sample. For example,these cells can be extracted from whole blood using ficoll, ahydrophilic polysaccharide that separates layers of blood, and gradientcentrifugation. Next, T cell can be isolated from the PBMC sample. Tcells can be isolated from PBMCs by lysing the red blood cells anddepleting the monocytes, for example, by centrifugation through aPERCOLL™ gradient or by counterflow centrifugal elutriation. Optionally,a subpopulation of T cells may be further enriched by marker-basedsorting. The marker can be a cell surface marker. Examples of cellsurface markers include, but are not limited to, CD39, CD69, CD103,CD25, PD-1, TIM-3, OX-40, 4-1BB, CD137, CD3, CD28, CD4, CD8, CD45RA,CD45RO, GITR and FoxP3. The marker can be a cytokine. Examples ofcytokine markers include, but are not limited to, IFN-γ, TNF-alpha,IL-17A, IL-2, IL-3, IL-4, GM-CSF, IL-10, IL-13, granzyme B and perforin.The T cell or the subpopulation of T cells can then be subjected tosingle cell sequencing to obtained paired sequences of natively pairedTCRs (e.g., informatically paired TCR sequences in FIG. 9A). A tumorsample may also be obtained from the same patient. The tumor sample maybe a fixed or frozen sample. For example, the tumor sample may be fixedby a fixing agent such as formaldehyde. The tumor sample may be aformalin-fixed paraffin-embedded (FFPE) tissue sample. Next, the tumorsample can be subjected to bulk sequencing to obtain CDR3 sequences ofTCR chains. Next, the CDR3 sequences obtained from the tumor sample canbe used to compare with the CDR3 sequences of the paired sequences toidentify tumor-infiltrating TCRs. The tumor-infiltrating TCRs can beexpressed in normal T cells or cell lines, which are shown as “virtualTILs” in FIG. 9A.

FIG. 9B shows an example application of virtual TILs. The virtual TILscan comprise a reporter system, which can be used for reporter-based Tcell selection for target-reactive TCRs. For example, the virtual TILscan be a reporter cell comprising a reporter gene, which reporter geneis regulated to send a signal when a TCR of the cell binds to a targetantigen. These virtual TILs can be activated by contacting withantigen-loaded antigen-presenting cells (APCs) or artificial APCs. Next,target-reactive T cells can be selected out, for example, by FACS, basedon the signal generated by the reporter system or other selectionmechanisms (e.g., cell surface marker or cytokine marker). The selectionmay be based on cell surface marker expression on the virtual TILs afterthe cells contact MHC-bound antigen. The cell surface marker may beCD25, CD69, CD39, CD103, CD137, as well as other T cell activationmarkers, or any combination thereof. The selection may be based oncalcium influx. The selection may also be based on reporter geneexpression. The reporter gene may be a fluorescent protein (such as GFPand mCherry). The reporter gene may be under the control of atranscription factor which is regulated by TCR signaling. Examples ofthese transcription factors include, but are not limited to, AP-1, NFAT,NF-kappa-B, Runx1, Runx3, etc. The selection may be based on cytokinesreleased from the activated virtual TILs using methods such as ICS andcytokine capture assay. FIG. 9C shows another application of virtualTILs. After identifying target-reactive TCRs, the target-reactive TCRscan be delivered and expressed in host cells such as autologous T cells(the T cells isolated from the same patient where the tissue sample andthe blood sample were obtained). The target-reactive TCRs can bedelivered and expressed in an allogeneic T cell. The T cells expressingthe target-reactive TCRs can then be administered into the same patientto treat diseases such as cancer.

The method of identifying a sequence of a natively paired T-cellreceptor (TCR) in a tissue sample (e.g., a solid sample) from a subjectcan comprise identifying one or more paired sequences of one or morenatively paired TCRs in a sample containing a plurality of peripheral Tcells obtained from the subject. Each of the one or more pairedsequences can comprise a CDR3 sequence. Next, a tissue CDR3 sequence ofa TCR chain of a TCR in the tissue sample can be identified, for whichthe other TCR chain to which it is natively paired may be unknown. Thetissue CDR3 sequence can match a CDR3 sequence of at least one pairedsequence of the one or more paired sequences of the one or more nativelypaired TCRs, thereby identifying the at least one paired sequence as thesequence of the natively paired TCR in the tissue sample. Also providedherein is a method of identifying a target-reactive T-cell receptor(TCR). The method can comprise providing a cell comprising the TCRidentified using the methods described herein. Next, the cell can becontacted with a target antigen presented by an antigen-presenting cell(APC). The cell can bind to the target antigen presented by the APC viathe TCR, thereby identifying the TCR as the target-reactive TCR.

The APC described herein can be professional APC such as dendritic cell,macrophage, or B cell. The APC can be a monocyte or monocyte-deriveddendritic cell. An aAPC can express ligands for T cell receptor andcostimulatory molecules and can activate and expand T cells fortransfer, while improving their potency and function in some cases. AnaAPC can be engineered to express any gene for T cell activation. AnaAPC can be engineered to express any gene for T cell expansion. An aAPCcan be a bead, a cell, a protein, an antibody, a cytokine, or anycombination. An aAPC can deliver signals to a cell population that mayundergo genomic transplant. For example, an aAPC can deliver a signal 1,signal, 2, signal 3 or any combination. A signal 1 can be an antigenrecognition signal. For example, signal 1 can be ligation of a TCR by apeptide-MHC complex or binding of agonistic antibodies directed towardsCD3 that can lead to activation of the CD3 signal-transduction complex.Signal 2 can be a co-stimulatory signal. For example, a co-stimulatorysignal can be anti-CD28, inducible co-stimulator (ICOS), CD27, and 4-1BB(CD137), which bind to ICOS-L, CD70, and 4-1BBL, respectively. Signal 3can be a cytokine signal. A cytokine can be any cytokine. A cytokine canbe IL-2, IL-7, IL-12, IL-15, IL-21, or any combination thereof.

In some cases, an aAPC may be used to activate and/or expand a cellpopulation. In some cases, an artificial may not induce allospecificity.An aAPC may not express HLA in some cases. An aAPC may be geneticallymodified to stably express genes that can be used to activation and/orstimulation. In some cases, a K562 cell may be used for activation. AK562 cell may also be used for expansion. A K562 cell can be a humanerythroleukemic cell line. A K562 cell may be engineered to expressgenes of interest. K562 cells may not endogenously express HLA class I,II, or CD1d molecules but may express ICAM-1 (CD54) and LFA-3 (CD58).K562 may be engineered to deliver a signal 1 to T cells. For example,K562 cells may be engineered to express HLA class I. In some cases, K562cells may be engineered to express additional molecules such as B7,CD80, CD83, CD86, CD32, CD64, 4-1BBL, anti-CD3, anti-CD3 mAb, anti-CD28,anti-CD28mAb, CD1d, anti-CD2, membrane-bound IL-15, membrane-boundIL-17, membrane-bound IL-21, membrane-bound IL-2, truncated CD19, or anycombination. In some cases, an engineered K562 cell can expresses amembranous form of anti-CD3 mAb, clone OKT3, in addition to CD80 andCD83. In some cases, an engineered K562 cell can expresses a membranousform of anti-CD3 mAb, clone OKT3, membranous form of anti-CD28 mAb inaddition to CD80 and CD83.

Kits

The compositions described herein can be provided in a kit. For example,the kit can comprise a container having a pool of nucleic acid moleculesthat can be used to construct a plurality of polynucleotide molecules,each polynucleotide encoding a TCR chain or a portion thereof, or acognate pair of TCR chains. In some cases, each nucleic acid molecule ofthe pool of nucleic acid molecules encodes a CDR3 of the TCR chain. Insome cases, each nucleic acid molecule of the pool of nucleic acidmolecules encodes a first CDR3 and a second CDR3 of a cognate pair ofTCR chains. In some cases, each nucleic acid molecule of the pool ofnucleic acid molecules comprises a sequence derived from a TCR V gene.In some cases, each nucleic acid molecule of the pool of nucleic acidmolecules comprises a connector sequence as described herein. Theconnector sequence may have a different sequence than other connectorsequences in the same pool of nucleic acid molecules. The kit cancomprise one or more containers, each container containing a pool ofnucleic acid molecules. The nucleic acid molecules provided in the kitcan be in liquid form or dried form (e.g., lyophilized form).

The kit can further comprise instructional material to direct a user touse the pool of nucleic acid molecules to construct the plurality ofpolynucleotide molecules encoding TCRs.

The kit can further comprise at least one reagent (e.g., buffer, enzyme,additive, etc.) that can be used in the reaction of constructing nucleicacid molecules.

EXAMPLES Example 1. Converting a CDR3-J Oligonucleotide Pool to aFull-Length, Expressible TCR Pool

This example uses 3 Type IIS Restriction Enzyme to create sticky ends.Such enzymes are commercially available. In this example, two enzymesthat create a 4-bp 5′ overhang (for example, BbsI, BbvI, BcoDI, BsaI,BsmBI, FokI, etc.) and one restriction enzyme that creates a blunt endor 3′ overhang (for example, BseRI, BsrDI, BtsI, MlyI, etc.) are used.The optimal enzyme set to use can depend on practical factors (e.g.,local availability, cutting efficiency, star activity) and can be easilychosen experimentally. Here, the first two restriction enzymes arecalled TIISRE1, TIISRE2, and the last restriction enzyme is calledTIISRE3.

In this example, the paired CDR3-J oligonucleotides are synthesized in‘head-to-tail’ orientation with respect to the coding sequence of thealpha and beta CDR3-J. In other words, the alpha CDR-3J and beta CDR-3Jare synthesized in the same 5′ to 3′ direction. The resultantfull-length, expressible TCR polynucleotide is also in head-to-tailorientation. The paired CDR3-J oligonucleotides can be synthesized inother orientations, for example, head-to-head and tail-to-tail. Methodsdescribed herein can be combined with methods described in U.S.Provisional Patent Applications Nos. 62/718,227, 62/725,842, 62/732,898,62/818,355 and 62/823,831, each of which is entirely incorporated hereinby reference, to design paired CDR-3J oligonucleotides and obtainfull-length, expressible TCR polynucleotides in other orientations.

As shown in FIGS. 1A-1C, the paired CDR3-J oligo contains thereverse-complement sequence of TRBJ, CDR3beta, TRAJ, and CDR3alpha, inthe 5′ to 3′ order, with other intervening domains to be describedbelow. Throughout this document, the symbol ‘*’ denotes complementarity.For example, if P refers a polynucleotide sequence, the P* refers to thereverse complement of P. Also, when appropriate, the letter X is used torefer to A or B. For example, TRXV may be used to refer to TRAV and TRBVcollectively. For clarity, in this example and in FIGS. 1A-1C, TRAJdomain and TRBJ domain refer to the polynucleotide sequences encodingparts of TRAJ region and TRBJ region, respectively, that are notincluded in the CDR3.

BCC stands for “beta constant connector”, whose function is to connectwith TRBC sequence. ConB # is the connector for a specific TRBVsequence, with the symbol # denoting a numerical ID of a TRBV gene.Similarly, ConA # is the connector for a specific TRAV sequence. ICCstands for “inter-chain connector”, which will be used for connectingTRBV for to ConB #, as well as connecting TRAC to TRAJ.

ConB # and ConA # domains can be codon-diversified (see Example 2) sothat ConX # for different TRBV genes are sufficiently different atnucleotide level that ConX # and ConX #* can hybridize with highly yieldonly when the numerical IDs for ConX # and ConX #* are the same.

A library of 48 partially double-stranded TRAV #_GL polynucleotides (onefor each TRAV gene in IMGT that are annotated as functional) can beprepared using conventional methods. All TRAV #_GL polynucleotides canbe mixed to create the TRAV #_GL pool. GL stands for germline. The topstrand of each TRAV #_GL polynucleotide contains (1) a P2A-3 domain,which encodes the 3′ portion of the self-cleaving P2A peptide, (2) aTRAV #_GL5 domain, which encodes the 5′ portion of the germline sequenceof TRAV #, including L, FR1, CDR1, FR2, CDR2, and the portion of FR3upstream of ConA #, in this order, and (3) ConA # which encodes thefinal stretch of FR3 and is codon-diversified. The bottom strand of eachTRAV #_GL polynucleotide contains TRAV #_GL5* and P2A-3*. Thus, the TRAV#_GL polynucleotide has a 3′ overhang with the sequence ConA #. Alibrary and a pool of 48 TRBV #_GL polynucleotides can be similarlyprepared. The P2A-3 domain in TRAV #_GL can be replaced by T2A-3 in TRBV#_GL. T2A is another self-cleaving peptide.

A pool of 1,000 to 500,000 paired CDR3-J oligonucleotides can beprepared by chip-based synthesis.

In Step (1), the TRAV #_GL pool can be mixed with the paired CDR3-J poolat a temperature that allows specific hybridization between ConA # andConA #*. Then, in Step (2), a DNA polymerase can be used to extend thetop strand of TRAV #_GL, and a ligase can be used to ligate the pairedCDR3-J oligo and the bottom strand of TRAV #_GL.

BCC contains the recognition site of TIISRE1. In Step (3), TIISRE1 canbe used to cleave at BCC, leaving a 4-base 5′ overhang at the bottomstrand. In this example, the 4 bases are the antisense of the first 4bases of TRBC1. In Step (4), this cleavage product can be ligated to apre-prepared TRBC P2A-5 SE which contains the full TRBC1 sequence and aP2A-5 domain, and has a 4-based 5′ overhang at the beginning of theTRBC1 sequence. The P2A-5 domain is the 5′ end portion of the P2A codingsequence. SE stands for sticky end. This ligation production can bePCR-amplified in Step (5).

In Step (6), this amplification product can be circularized by ligationbetween P2A-5 and P2A-3 using method described in U.S. ProvisionalPatent Applications Nos. 62/718,227, 62/725,842, 62/732,898, 62/818,355and 62/823,831. After ligation, P2A-5 and P2A-3 forms P2A. In thisexample, the ICC contains the recognition site of TIISRE3, which, inStep (7) can be used to cleave immediately 3′ of ConB #* on the bottomstrand. The cutting site on the top strand is less important. In Step(8), this cleavage product can be heated up to separate the top andbottom strands. A primer containing the first ˜20 bases of TRBC1 can beused to extend on the bottom strand, leaving a single-stranded region atthe 3′ end of the bottom strand. At the tip of the 3′ end of this strandis the ConB #* domain. In Step (9), the TRBV #_GL pool can be added soConB # on the top strand of TRBV #_GL can hybridize with thecorresponding ConB #*. DNA polymerase and ligase can be added to convertthe hybridization product to fully double-stranded DNA.

The remnant of ICC also contains the recognition site of TIISRE2, whichin Step (10) can be used to cleave ICC, leaving a 4-base 5′ overhangwhich is the antisense sequence of the first 4 bases of TRAC. In Step(11) a pre-prepared TRAC_SE can be ligated to the 5′ overhang, formingcomplete TRAC sequence, similar to Step (4) described above.

The final product can be ligated into a lentiviral backbone or proper‘homology sequence’ used for CRISPR/TALEN/ZFN-based knock-in.

Example 2. Testing Codon Diversification Using Human TRAV and TRBVSequences

In this example, a thermodynamics-based algorithm is provided to designcodon-diversified ConA # and ConB # sequences. The algorithm is writtenin MATLAB language. Some variables and custom functions used in thisalgorithm will be described in the ‘note’ section below, with the restdescribed in the comment of the code or self-explanatory to skilledartisans. Some custom functions rely on thermodynamics-based simulationof DNA hybridization using publicly available thermodynamic parameters(e.g., ΔH and ΔS for base pair stacks) and models (e.g., ΔS as afunction of loop size). These parameters and models have beenextensively published by John SantaLucia Jr. Skilled artisan can readilywrite these functions from scratch or with the help of publiclyavailable software packages such as NUPACK. The algorithm contains twostages: initial design and codon diversification, which are described inScript1 and Script2, respectively. In initial design, ConA # and ConB #sequences are designed according to the original TRAV or TRBV sequences.Hybridization yield of every ConX # to every ConX #* is then computed toserve as a baseline (FIG. 2A and FIG. 2B). FIG. 2A shows hybridizationyield of the connector sequences designed according to the original TRAVsequences without codon diversification (ConA # to ConA #*). FIG. 2Bshows hybridization yield of the connector sequences designed accordingto the original TRBV sequences without codon diversification (ConB # toConB #*). During codon diversification, the codon choices of the last˜60 bases of some of the TRXV #_GL are randomized, and ConX # sequencesthat allow specific hybridization are chosen. Next, hybridization yieldof every ConX # to every ConX #* using the codon-diversified sequenceset is then calculated to see if the codon diversification wassuccessful (FIG. 3A and FIG. 3B). FIG. 3A shows hybridization yield ofthe codon-diversified connector sequences (ConA # to ConA #*). FIG. 3Bshows hybridization yield of the codon-diversified connector sequences(ConB # to ConB #*).

Seript1: Initial design. clear fHybTemp = 60; %Hybridization temperature(unit: degree C) fConcNa = 125; %Sodium ion concentration (unit: mM)fConcMg = 3; %Magnesium ion concentration (unit: mM) fConcQB = 5;%Concentration of ConA# or ConB# %Store the parameters above instruPara. struPara.fHybTemp = fHybTemp; struPara.fConcNa = fConcNa;struPara.fConcMg = fConcMg; struPara.fConcQB = fConcQB; cChain = ‘B’; %The value can be A or B to design initial ConA and ConB sequencesrespectively. cFileGeneSeq = sprintf(‘hsTR%sV_UTR200-L-V_Sorted_FOnly.txt’,cChain); % See notes % Read V gene sequences fidGene= fopen(cFileGeneSeq); raGeneInfo =textscan(fidGene,‘%s\t%d\t%d\t%s’,‘Headerlines’,1); fclose(fidGene); %Initialize the cell array (ra1on1) that stores the initially designedConA or ConB sequences iTotalNumOfGene = size(raGeneInfo{1},1);ra1on1s{1} = −ones(iTotalNumOfGene,2); ra1on1s{2} =cell(iTotalNumOfGene,4); for iGeneNum = 1:iTotalNumOfGene  fprintf(‘Designing 1on1 for gene #%u.\n’,iGeneNum);   cGeneName =raGeneInfo{1}{iGeneNum};   cGeneSeq = raGeneInfo{4}{iGeneNum};   iLStart= raGeneInfo{2}(iGeneNum);   cCDS = cGeneSeq(iLStart:end);   cAA =nt2aa(cCDS);   iPosAAConservedC = find(cAA==‘C’,1,‘last’);  viPosNTConservedC = iPosAAConservedC*3−2:iPosAACon-   servedC*3;  vPosOfTinCys(iGeneNum) = viPosNTConservedC(1); % Position of the firstnucleotide of the codon for the conserved Cys at the N terminus of CDR3  cSA60 = cCDS(viPosNTConservedC(1)−59:viPosNTConservedC(1));  ra1on1ofThisGene = fun Design1on1(cSA60,struPara); % See notes  disp(ra1on1ofThisGene{1})   ra1on1s{1}(iGeneNum,:) =ra1on1ofThisGene{1};   ra1on1s{2}(iGeneNum,:) = ra1on1ofThisGene{2}; end%% cTime = datestr(now); cTime(cTime==‘ ’)=‘_’; cTime(cTime==‘:’)=‘_’;save([‘IniDesign1on1_’,cChain,num2str(fHybTemp),‘_’,cTime,‘.mat’],‘vPosOfTinCys’,‘cChain’,‘raGeneInfo’,‘ra1on1s’,‘struPara’); %% Computecross hybridization yield xFracBoundHyb_THyb =−ones(iTotalNumOfGene,iTotalNumOfGene); for iSimQB = 1:iTotalNumOfGene  for iDE = 1:iTotalNumOfGene     if xFracBoundHyb_THyb(iSimQB,iDE) >= 0      continue;     end     cSimQB = ra1on1s{2}{iSimQB,3}; % cSimQB =sequence of     ConA/B     cDE = ra1on1s{2}{iDE,4}; % cDE = sequence ofConA/B*     fThisFracBound_THyb =NP_GetBoundFrac(cSimQB,fConcQB,cDE,fConcQB/100,...      fHybTemp,‘Na’,fConcNa,‘Mg’,fConcMg); % See note    xFracBoundHyb_THyb(iSimQB,iDE) = fThisFracBound_THyb;   end end %%save(‘IniDesign.mat’,‘xFracBoundHyb_THyb’,‘xFracBoundHyb_THybMinus5oC’); figure; colormap(gray);imagesc(1-xFracBoundHyb_THyb);

Notes for Script1:

The files “hsTRAV_UTR200-L-V_Sorted_FOnly.txt” and“hsTRBV_UTR200-L-V_Sorted_FOnly.txt” are TSV files recording thesequences of all TCR V genes annotated as ‘functional’ in IMGT database.Each file has 4 columns, the first column is the name of the V gene, the4^(th) column is the sequence of the V gene cDNA sequencing startingfrom ˜200 nt upstream of the start codon (of L-PART1), the 2^(nd) columnis the position of the first nucleotide of the start codon. The 3^(rd)column is the position of the first nucleotide of the V gene (e.g.,after L-PART2).

The function “fun_Design1on1” returns the ConA or ConB sequence usingtwo inputs: (1) the variable cSA60 which records the last 60 bases ofthe TRXV #_GL, and (2) parameters for thermodynamic modeling stored inthe variable struPara. Briefly, the function finds the shortestcontinuous subsequence of cSA60 ending at the 3′ end of cSA60 (noted asConX) that satisfies the following statement: when 5 nM of a first DNAoligonucleotide having sequence ConX and 0.05 nM of a second DNAoligonucleotide having sequence ConX* is mixed, more than 97% of thesecond oligonucleotide is predicted to be bound to the firstoligonucleotide at the temperature, sodium ion concentration andmagnesium ion concentration defined by struPara.fHybTemp,struPara.fConcNa, and struPara.fConcMg respectively. The output of thisfunction (ra1on1ofThisGene) is a cell array with two cells, the firstcell, ra1on1ofThisGene{1} is a 1×2 vector, where ra1on1ofThisGene{1}(1)is an output not used in this example, and ra1on1ofThisGene{1}(2) is theposition of the first base of ConX on cSA60. ra1on1ofThisGene{2} is a1×4 cell array, where ra1on1ofThisGene{2}{1} and ra1on1ofThisGene{2}{2}are not used in this example, ra1on1ofThisGene{2}{3} is the sequence ofConX, and ra1on1ofThisGene{2}{4} is the sequence of ConX*. A skilledartisan can write this function as described above.

The function “NP_GetBoundFrac” returns the fraction of a first DNAoligonucleotide having sequence ConX* that is bound to a second DNAoligonucleotide having sequence ConX when 5 nM (as recorded by fConcQB)of the second nucleotide and 0.05 nM (as recorded by fConcQB/100) of thefirst nucleotide is mixed at 60° C. (as recorded by fHybTemp) and reachequilibrium in the presence of 125 mM Na⁺ (as recorded by fConcNa) and 5mM Mg⁺⁺ (as recorded by fConcMg).

The image produced by this script shows a gray scale heat map of howwhat fraction of ConX #* is predicted to be bound to ConX # when 0.05 nMof ConX #* is mixed with 5 nM of ConX # at the condition describedabove. As shown in FIG. 2A and FIG. 2B, substantial cross-binding (e.g.,mis-connection) is present especially for TRBV (FIG. 2B).

Script 2. Codon diversification clear; load(‘IniDesign.mat’); fConcQB =struPara.fConcQB; fConcNa = struPara.fConcNa; fConcMg =struPara.fConcMg; fHybTemp = struPara.fHybTemp; fCodonFregThreshold =0.15;  % Lowest allowed codon frequency fSSThrehold = 0.6;fCrossAssemblyThreshold = 0.02; % Lowest allowed level of mis-connection iMaxLengthDE = 35; % Maximum allowed length of ConXstruParaDiv.fCodonFreciThreshold = fCodonFreciThreshold;struParaDiv.fSSThrehold = fSSThrehold; struParaDiv.fCrossAssemblyThreshold = fCrossAssemblyThreshold; struParaDiv.iMaxLengthDE = iMaxLengthDE; %% Initiate ra1on1NewDesign = cell(1,2);iTotalNumOfGene = size(ra1on1s{1},1);xCrossPrimeAlreadyDesigned_Hyb_THyb = −ones(iTotalNumOfGene); %% foriGeneToModify = 1:iTotalNumOfGene   cSA60 = ra1on1s{2}{iGeneToModify,1};  cAAInFrame = nt2aa(cSA60(3:59));   fprintf(‘%s\n’,cSA60);  bHaveYouTriedInitialDesign = false;   while (1)     if~bHaveYouTriedInitialDesign       ra1on1ofThisGene{1} =ra1on1s{1}(iGeneToModify,:);       ra1on1ofThisGene{2} =ra1on1s{2}(iGeneToModify,:);       bHaveYouTriedInitialDesign = true;    else %Randomize       cTrialMiddle57 =fun_aa2nt(cAAInFrame,raCodonTable,fCodonFreciThreshold);  % See note      cAAToMakeSure =nt2aa(cTrialMiddle57,‘AlternativeStartCodons’,‘false’);       if~strcmpi(cAAInFrame,cAAToMakeSure)         error(‘something is wrong’);      end       cTrialSA60 = [cSA60(1:2),cTrialMiddle57,‘T’];      fprintf(‘%s\n’,cTrialSA60);       ra1on1ofThisGene =fun_Design1on1(cTrialSA60,struPara);     end     iStartPosBM =ra1on1ofThisGene{1}(2);     cTrialSimQB = ra1on1ofThisGene{2}{3};    cTrialDE = ra1on1ofThisGene{2}{4};     % Check DE length    iLeftestStartPosBM = 60 − iMaxLengthDE + 1;     if iStartPosBM <iLeftestStartPosBM       fprintf(‘DE too long\n’);       continue;    end     %%     vRow_THyb = −ones(1,iGeneToModify);     vColumn THyb= −ones(iGeneToModify,1);     for iDE = 1:iGeneToModify       if iDE ~=iGeneToModify         cDE = ra1on1NewDesign{2}{iDE,4};       else        cDE = cTrialDE;       end       fFracHyb THyb =NP_GetBoundFrac(cTrialSimQB,fConcQB,cDE,fConcQB/100,...        fHybTemp,‘Na’,fConcNa,‘Mg’,fConcMg);       vRow_THyb(iDE) =fFracHyb_Hyb;     end     for iQB = 1:iGeneToModify       if iQB ~=iGeneToModify         cExtSimQB = ra1on1NewDesign{2}{iQB,2};       else        cExtSimQB = cTrialExtSimQB;       end       fFracHyb_THyb =NP_GetBoundFrac(cExtSimQB,fConcQB,cTrialDE,fConcQB/100,...        fHybTemp,‘Na’,fConcNa,‘Mg’,fConcMg);       vColumn_THyb(iQB) =fFracHyb_THyb;     end     vRowToUse = vRow_THyb;     vColumnToUse =vColumn_THyb;     if vRowToUse(end) >= 0.5 && sum(vRowToUse(1:end-1))<fCrossAssemblyThreshold && ...         vColumnToUse(end) >= 0.5 &&sum(vColumnToUse(1:end-1))<fCrossAssemblyThreshold      ra1on1NewDesign{1} (iGeneToModify,:) = ra1on1ofThisGene{1};      ra1on1NewDesign{2} (iGeneToModify,:) = ra1on1ofThisGene{2};xCrossPrimeAlreadyDesigned_Hyb_THyb(iGeneToModify, 1:iGeneToModify) =vRow_THyb; xCrossPrimeAlreadyDesigned_Hyb_THyb(1:iGeneToModify,iGeneToModify) = vColumn_THyb;       break;     end   end end %% figure;colormap(gray) imagesc(1-xCrossPrimeAlreadyDesigned_Hyb_THyb);

Notes for Script2:

The function “fun_aa2nt” returns a polynucleotide sequence that encodesthe same polypeptide as the input sequence cAAInFrame, using the codontable information provided by the input raCodonTable, and lowest allowedcodon frequency provided by the input fCodonFreqThreshold.

The image produced by this script shows a gray scale heat map of whatfraction of ConX #* is predicted to be bound to ConX # when 0.05 nM ofConX #* is mixed with 5 nM of ConX # at the condition described aboveafter codon diversification. As shown in FIG. 3A and FIG. 3B, onlyspecific hybridization is predicted to happen noticeably. Thus, thisexample shows the codon diversification scheme is feasible, and showshow to obtain codon diversified ConA and ConB sequences.

Example 3. Connector Sequences Derived from Mouse TRAV and TRBV Genes

This example provides codon-diversified connector sequences derived frommouse TRAV and TRBV genes. Similar to the above examples, ConA is theconnector for a specific TRAV sequence, and ConB is the connector for aspecific TRBV sequence. The codon diversification was performed usingthe same methods as described in Example 2. Table 1 showscodon-diversified connector sequences derived from mouse TRAV genes.Table 2 shows codon-diversified connector sequences derived from mouseTRBV genes. In Tables 1 and 2, The gene name and accession number isshown for each V gene in the first column, and the correspondingconnector sequence is shown in the second column.

TABLE 1 Connector sequences derived from mouse TRAV genes SEQConnector of mouse TRAV ID NO: genes Connector Sequence 1 > ConA ofTCAAAGACTCTGCCTCATACCTCT Trav1|ENSMUST00000103567.5 2 > ConA ofCTGAGAGACGCAGCTGTGTATTACT Trav2|ENSMUST00000196939.1 3 > ConA of Trav3-GGGGACTCAGCCGTGTACTTCT 1|ENSMUST00000103569.2 4 > ConA of Trav3-GGTGACTCCGCAGCCTATTTCT 3|ENSMUST00000181768.2 5 > ConA of Trav3-TGGGGATAGCGCAGTCTATTTCT 4|ENSMUST00000103670.3 6 > ConA of Trav3d-CCGGAGACAGCGCAGTTTATTTTT 3|ENSMUST00000196023.1 7 > ConA of Trav3n-GGTGACAGCGCCGTCTATTTTT 3|ENSMUST00000197557.1 8 > ConA of Trav4-TGGAGGACTCAGGCACTTACTTCT 2|ENSMUST00000103637.5 9 > ConA of Trav4-TGGAGGACTCTGGGACATACTTTT 3|ENSMUST00000103655.2 10 > ConA of Trav4-4-TGGAGGACTCTGGCACCTATTTTT dv10|ENSMUST00000103663.5 11 > ConA of Trav4d-ACTCGAGGATTCCGGTACTTATTTCT 3|ENSMUST00000103592.1 12 > ConA of Tray4d-GAAGACTCCGGGACCTACTTTT 4|ENSMUST00000103600.2 13 > ConA of Tray4n-GCTGGAGGATTCCGGAACCTATTTCT 3|ENSMUST00000103618.1 14 > ConA of Tray4n-CTCGAAGATAGCGGCACATATTTTT 4|ENSMUST00000103627.2 15 > ConA of Trav5-GCCTGGTGATAGCGCAATATACTTCT 1|ENSMUST00000103570.1 16 > ConA of Trav5d-TGGCGACTCTGCAATGTACTTCT 4|ENSMUST00000179701.1 17 > ConA of Trav5n-CCCGGAGACTCTGCTATGTATTTTT 4|ENSMUST00000179997.1 18 > ConA of Trav6-GGAATCCGATAGCGCAGTCTATTACT 1|ENSMUST00000103571.1 19 > ConA of Trav6-TCCGACAGCGCTGTCTACTACT 2|ENSMUST00000198058.1 20 > ConA of Trav6-AAGAGATTGATAGCGCTGTTTACTACT 3|ENSMUST00000180549.2 21 > ConA of Trav6-AGGAATCTGATTCCGCAGTCTATTTTT 4|ENSMUST00000184650.1 22 > ConA of Trav6-AGAATCTGATAGCGCCGTTTATTATT 5|ENSMUST00000181210.2 23 > ConA of Trav6-GTCCGACTCCGCAGTCTACTACT 6|ENSMUST00000103584.3 24 > ConA of Trav6-7-dvAGGAGTCTGATTCTGCAGTCTACTATT 9|ENSMUST00000103638.5 25 > ConA of Trav6d-CCAAGAAATAGATTCCGCAGTCTACTAC 3|ENSMUST00000181483.2 T26 > ConA of Trav6d- GTCTGACAGCGCAGTCTACTTCT 4|ENSMUST00000180717.227 > ConA of Trav6d- AGGAAAGCGATTCTGCAGTCTATTACT 5|ENSMUST00000180687.228 > ConA of Trav6d- AAGAGTCTGACTCCGCAGTTTATTATT 6|ENSMUST00000197754.129 > ConA of Trav6d- AGAATCCGACTCTGCAGTTTACTATT 7|ENSMUST00000178650.230 > ConA of Trav6n- GAGTCTGATAGCGCTGTGTACTACT 5|ENSMUST00000103611.131 > ConA of Trav6n- GAATCTGACTCTGCCGTTTACTATT 6|ENSMUST00000181793.232 > ConA of Trav6n- AGTCCGACTCTGCTGTGTACTACT 7|ENSMUST00000179607.233 > ConA of Trav7- CCATCTGATTCCGCACTGTATTTCT 1|ENSMUST00000198019.134 > ConA of TRAV7-2|AC004407 ACCTTCTGATAGCGCTCTCTATTTTT35 > ConA of Trav7- CCTTCTGATTCTGCACTGTACCTGT 3|ENSMUST00000177622.336 > ConA of Trav7- CCAAGCGATTCTGCACTGTATTTTT 4|ENSMUST00000181728.237 > ConA of Trav7- CCTCTGACTCTGCAGTCTACCTCT 5|ENSMUST00000200609.138 > ConA of Trav7- CCCAGCGACTCTGCAGTTTATCTCT 6|ENSMUST00000103641.539 > ConA of Trav7d- CCGACAGCGCACTCTACCTGT 2|ENSMUST00000200127.140 > ConA of Trav7d- TTCCGACTCTGCACTGTATCTGT 3|ENSMUST00000179789.341 > ConA of Trav7d- TCCGATAGCGCCCTGTATTTCT 4|ENSMUST00000178768.342 > ConA of Trav7d- CTCCGATTCCGCACTCTATCTCT 5|ENSMUST00000197128.143 > ConA of Trav7D- CCTCCGATAGCGCTGTTTATCTCT 6|ENSMUST00000196756.144 > ConA of Trav7n- GCGACAGCGCCCTGTACTTTT 4|ENSMUST00000103609.145 > ConA of Trav7n- CCCTCTGATAGCGCACTGTATCTCT 5|ENSMUST00000199753.146 > ConA of Trav7n- CTTCTGACAGCGCTGTGTATCTGT 6|ENSMUST00000178100.247 > ConA of Trav8- GCGAGGACACAGCTGTTTACTTTT 1|ENSMUST00000103643.348 > ConA of TRAV8-2|AC004096 AGTGCGAAGATACAGCAGTTTACTTCT49 > ConA of Trav8d- CGGTGTGAGGATACTGCTGTTTATTTCT 1|ENSMUST00000103580.350 > ConA of Trav8d- CGAAGATACCGCCGTCTACTTTT 2|ENSMUST00000198439.451 > ConA of Trav8n- GGCAACTGACACAGCAGTCTACTTTT 2|ENSMUST00000103632.352 > ConA of Trav9- GAGCGATTCTGCCGTTTACTTCT 1|ENSMUST00000103581.553 > ConA of Trav9- TCCGATTCCGCCGTGTATTTTT 2|ENSMUST00000103654.254 > ConA of Trav9- TTGGTCTGATTCTGCAGTTTACTTTT 4|ENSMUST00000103662.555 > ConA of Trav9d- GGTCTGATTCCGCTGTCTACTTTT 1|ENSMUST00000178426.356 > ConA of Trav9d- CTGGTCTGACTCTGCTGTTTATTTTT 2|ENSMUST00000199746.457 > ConA of Trav9d- GGTCCGACTGGGCAGTCTATTTTT 3|ENSMUST00000178252.258 > ConA of Trav9d- TGGTCTGATTCTGCCGTCTATTTCT 4|ENSMUST00000200548.159 > ConA of Trav9n- AGCGACTCTGCCGTGTATTTCT 2|ENSMUST00000198913.460 > ConA of Trav9n- GAGCGATTGGGCAGTCTACTTTT 3|ENSMUST00000177705.261 > ConA of Trav9n- TGGTCCGATTCTGCTGTCTATTTTT 4|ENSMUST00000103626.262 > ConA of AGCCTGAAGATTCAGCCATCTACTTCT Trav10|ENSMUST00000103583. 463 > ConA of CCCGAGGACTCTGCTATTTACTTCT Trav10d|ENSMUST00000103646. 464 > ConA of ACAGCCAGAAGATTCTGCAATATACTTC Trav10n|ENSMUST00000103612. T1 65 > ConA of GCTCGATGACACAGCTACATACATCT Trav11|ENSMUST00000103585. 366 > ConA of TRAV11D|AC004101 TCCTGGATGATACTGCAACATACATAT67 > ConA of Trav12- CTCTCTGACTCTGCACTGTACTACT 1|ENSMUST00000200115.168 > ConA of Trav12- ACTGTCTGACTCTGCACTCTATTACT 2|ENSMUST00000180972.269 > ConA of Trav12- CTGTCCGATTCTGCACTCTACTACT 3|ENSMUST00000103657.570 > ConA of Trav12d- AACTGTCTGATTCTGCTCTGTACTATT 1|ENSMUST00000181360.271 > ConA of Trav12d- TCCGACTCCGCTCTGTATTTTT 2|ENSMUST00000103593.272 > ConA of Trav12d- AGCGACTCTGCCCTCTACTACT 3|ENSMUST00000177703.273 > ConA of Trav12n- TCTCTGACTCCGCTCTCTACTACT 1|ENSMUST00000198682.174 > ConA of Trav12n- TCTCTGATTCTGCCCTCTACTTTT 2|ENSMUST00000103619.275 > ConA of Trav12n- GCTCTCCGATTCTGCTCTGTATTATT 3|ENSMUST00000179583.276 > ConA of Trav13- GACAACAGACTCAGGCACTTATCTCT 1|ENSMUST00000103651.377 > ConA of Trav13- AACAACTGACTCTGGCACATATTTTT 2|ENSMUST00000103658.378 > ConA of TRAV13-3|AC003995 CACTGATAGCGGAACCTACCTCT79 > ConA of Trav13-4-dv TGACAGCGGCACCTACCTGT 7|ENSMUST00000180380.280 > ConA of Trav13- ACTACAGATTCCGGCACTTACTTCT 5|ENSMUST00000103671.381 > ConA of Trav13d- GCCAGATAACTGATTCTGGTACTTACCT1|ENSMUST00000103588.3 GT 82 > ConA of Trav13d-ACAACTGACAGCGGAACATATCTCT 2|ENSMUST00000197954.1 83 > ConA of Trav13d-AAATAACAGATAGCGGTACATACCTGT 3|ENSMUST00000179512.2 84 > ConA of Trav13d-CCACAGATTCTGGCACCTACTTCT 4|ENSMUST00000196079.1 85 > ConA of Trav13n-ACTGACTCCGGAACCTACCTCT 1|ENSMUST00000198359.1 86 > ConA of Trav13n-ACCGACTCTGGCACTTACCTGT 2|ENSMUST00000196941.1 87 > ConA of Trav13n-AATCACAGACTCTGGAACCTATCTGT 3|ENSMUST00000179580.2 88 > ConA of Trav13n-CCAAATTACCGATTCTGGTACATACCTC 4|ENSMUST00000196105.1 T89 > ConA of Trav14- CGGAGATAGCGCCACATACTTTT 1|ENSMUST00000198297.190 > ConA of Trav14- AACCTGGAGATTCTGCAACATATTTCT 2|ENSMUST00000179267.391 > ConA of Trav14- CTGGGGACTCTGCAACTTACTTCT 3|ENSMUST00000103589.592 > ConA of Trav14d- CCTGGAGACTCAGCTACCTACTTCT 1|ENSMUST00000181038.293 > ConA of Trav14d- CCGGGGATAGCGCTACTTATTTTT 2|ENSMUST00000196802.194 > ConA of Trav14d-3- CCTGGAGATTCCGCAACTTACTTTTdv8|ENSMUST00000103608.3 95 > ConA of Travl 4n-CCAGGGGATTCTGCTACCTATTTTT 1|ENSMUST00000177578.1 96 > ConA of Travl 4n-CCCGGAGATTCTGCCACTTATTTCT 2|ENSMUST00000197614.1 97 > ConA of Trav14n-CTGGCGACAGCGCTACTTATTTCT 3|ENSMUST00000103652.498 > ConA of Trav15-1-dv6- AACCAGACGATTCGGGAAAGTATTTCT1|ENSMUST00000103653.2 99 > ConA of Trav15-2-dv6-CAGAGGATTCAGGGACGTACTTCT 2|ENSMUST00000103660.3100 > ConA of Trav15d-1-dv6d- CCAGACGACTCCGGAAAGTACTTTT1|ENSMUST00000103616.4 101 > ConA of Travl5d-2-dv6d-CCGAGGACTCCGGTACATACTTCT 2|ENSMUST00000199800.1 102 > ConA of Travl5n-AACCCGATGACTCTGGTAAGTATTTTT 1|ENSMUST00000103590.3103 > ConA of Travl5n- GCCAGAAGACTCCGGTACATATTTTT 2|ENSMUST00000199112.1104 > ConA of TCAAATTGAAGATTCTGCAGTCTACTTT Trav16|ENSMUST00000103667. T5 105 > ConA of Trav16d- GATTGAGGACTCGGCAGTATATTTCTdv11|ENSMUST00000103606.1 106 > ConA of AAATCGAAGACTCTGCAGTTTACTTTTTrav16n|ENSMUST00000199280. 1 107 > ConA of GAGCGACTCAGCCAAGTACTTCTTrav17|ENSMUST00000103672. 8 108 > ConA of AGGGGATGCTGGGATCTACTTTTTrav18|ENSMUST00000103673. 10 109 > ConA of CCCGAAGATACAGCTGTCTACCTGTTrav19|ENSMUST00000103674. 5 110 > ConA of Trav21-AGGGACGCAGCAGTCTATCATT dv12|ENSMUST00000180938.2 111 > ConA ofGCCACTCTGCCATCTACTTCTGT Trav23|ENSMUST00000199137. 1

TABLE 2 Connector sequences derived from mouse TRBV genes SEQ ID NO:Connector of mouse TRBV genes Connector sequence 112 > ConB ofGGCGCACACTGTACTGCACAT Trbv1|ENSMUST00000103262.2 113 > ConB ofTGATGACTCGGCCACATACTTCT Trbv2|ENSMUST00000103263.2114 > ConB of Trbv3|AE000663 TGGAGGACTCAGCTGTGTACTTCT 115 > ConB ofACCAGAAGATAGCGCAGTTTATCTGT Trbv4|ENSMUST00000103265.4 116 > ConB ofATCCAGAAGACTCAGCTGTCTATTTTT Trbv5|ENSMUST00000103266.2117 > ConB of Trbv12-1|M15614 TCGAAGATAGCGCCATGTACTTTT118 > ConB of Trbv12-2|M15613 ACTGGAAGATAGCGCTGTGTATTTCT119 > ConB of Trbv13- AAGCCAGACCAGCCTCTATTTTT 1|ENSMUST00000194399.1120 > ConB of Trbv13- CCCCTCTCAGACATCAGTGTACTTCT 2|ENSMUST00000103270.3121 > ConB of Trbv13- GCCAGACCGCCGTGTATTTCT 3|ENSMUST00000103271.1122 > ConB of GGCGACACAGCCACCTATCTCT Trbv14|ENSMUST00000103272.3123 > ConB of GCCTAAAGACAGCGCTGTTTATCTCT Trbv15|ENSMUST00000103273.2124 > ConB of CCAGGACTCAGCGGTGTATCTTT Trbv16|ENSMUST00000103274.3125 > ConB of GCCTAGAGTATTCTGCCATGTACCTCT Trbv17|ENSMUST00000103275.3126 > ConB of AAAAATGAGATGGCAGTCTTCCTCT Trbv19|ENSMUST00000103276.2127 > ConB of CGAGGATAGGGGCCTGTATCTCT Trbv20|ENSMUST00000103277.1128 > ConB of GCAGAAGACTCAGCACTGTACTTGT Trbv23|ENSMUST00000193997.5129 > ConB of Trbv24|IMGT GACGACTCAGCACTGTACCTCT 130 > ConB ofGGGGACTCCGCACTCTATCTCT Trbv26|ENSMUST00000193064.1 131 > ConB ofAAACAAACCAGACATCTGTGTACTTCT Trbv29|ENSMUST00000103281.2 132 > ConB ofGGCCTGGAGACAGCAGTATCTATTTCT Trbv30|ENSMUST00000103282.2 133 > ConB ofTCAGCCATAGCGGTTTTTACCTCT Trbv31|ENSMUST00000193003.1

In the initial design, ConA # and ConB # sequences are designedaccording to the original TRAV or TRBV sequences. As used herein, thesymbol # denotes a numerical ID of a TRAV or TRBV gene. Hybridizationyield of every ConX # to every ConX #* is then computed to serve as abaseline (FIG. 10A and FIG. 10B). FIG. 10A shows hybridization yield ofthe connector sequences designed according to the original TRAVsequences without codon diversification (ConA # to ConA #*). FIG. 10Bshows hybridization yield of the connector sequences designed accordingto the original TRBV sequences without codon diversification (ConB # toConB #*). During codon diversification, the codon choices of the last˜60 bases of some of the TRXV #_GL are randomized, and ConX # sequencesthat allow specific hybridization are chosen. Next, hybridization yieldof every ConX # to every ConX #* using the codon-diversified sequenceset is then calculated to see if the codon diversification wassuccessful (FIG. 11A and FIG. 11B). FIG. 11A shows hybridization yieldof the codon-diversified connector sequences (ConA # to ConA #*). FIG.11B shows hybridization yield of the codon-diversified connectorsequences (ConB # to ConB #*).

Example 4. Connector Sequences with Arbitrary Sequences

Table 3 provides arbitrary sequences that can be used as connectorsequences to link CDR3-J polynucleotides and the designated V genegermline polynucleotides according to the scheme described in FIG. 7.

TABLE 3 Connector sequences with arbitrary sequences SEQ SEQ ID ID NO:Connector sequence NO: Connector sequence 134 CCGGGATTTTGTGACTCATC 209CTTGTCCACTAAACGCAACG 135 GAGGATCGTATGTTTCGCAC 210 CGGGTATCACTGGGTAATGA136 CTTGTGTGCACTTACCGTAC 211 GGAACAGAGACCAATCCAGT 137TGATGCATCTCCAGTACAGG 212 GTGGGCATCCGAAATTTCAG 138 CTTCTGTGTGTACCTCGACA213 CGCGACGACATTACCAATAG 139 CTTGCAATCCTTTACCGTGC 214CGTCGGAATATGCTCTCAGA 140 GCAAGTGTGGAAAATGACCC 215 GATGCAGATCAATGAGTGGC141 GAGTCTAGTCTCACAACCCA 216 ACTGCTTACAAGTGTCCACG 142GAAATGTTGAGGACTCCACG 217 ACTGTATGCAAGCTAGTCCC 143 CCTAACAGATGCTACGTGGA218 AGATCTCCCAAAAGTGTCCG 144 GTAGGTCCACACAGATTCCA 219TTCCAGAACCATGTGATCCC 145 GCCAGTCACAGCAAATACAC 220 GCCTTGTCTTTCAACCTCTG146 CCGCTACCAGTATGTACCTT 221 GATACGGATCTTCACATGCG 147ACTGTGTTCCTTGTCTTCCG 222 CGCTCATCTAGGTTGGACTA 148 TGAATGCATCTACGGTACCG223 CGCGTTCAGATTCCAAACAG 149 GCGCTTATCAATCTTGCTCG 224GCCTGGTTACACATGCTATC 150 CGGTCAATTCAGTAGCCACT 225 GCAAAGGTCCTACAGGTTTC151 GGACACATGTACACTAGCCA 226 GGCTTTCCATGTCTATGCTC 152TGGGAGCTCTACGAAAATCC 227 GCGACATTAGCAGAGTAGGT 153 GTTCTCGAGATCGTCACACA228 CTCGCCATACTATCTGCATG 154 CTTCTGCATTCGATCCTTGC 229CTACTGAACACTTGGCAAGC 155 GCAGAGTTGTGTGATTGGAG 230 CTGTTCAATTCCTGTGCGAG156 CCCATCAATTCGGAACCATC 231 CACTGAGATGGAATTTGGCG 157ATCGTAACCCAAGTCTGTGG 232 GTCCATCACAACTTCCACTG 158 GGCGAAATGATCCCTGAATG233 GGCTCAGTCTACTTTGCTTC 159 AGTGCTCAGAACTTTCAGGC 234GTTAGCTTCCGACACAATGG 160 GATCGTTAACTCTTTGGGCG 235 CCGTGACACACTTTCATCTC161 ACACGAGGATTGCTGTAGAG 236 CCGGGATGTCATTATGAGCA 162TTCTACCACATTGTCTCGGG 237 GAGTGTCCTACGAGATCAGA 163 GAATGGCTAAACTGTGTCCC238 TAACGTCTCTCTGAGTGTGG 164 CGGACTGTACGAGAAACTGA 239ACCCTAGACAAGAGACACCT 165 CTTGCGACAAACTACTCCTG 240 TGCTCAGTACTCTTCATCGC166 CCGTTTTACTTTGTCGCCAC 241 AGCTCAATCATGGCTATCGG 167TGGATGATATCACTTCGGCG 242 CTACACATTGCATCCAACCC 168 CCAACCTCTATATGTGCCCA243 ACTTGTCGAATAGCTCAGGC 169 TGAACAGGTATGCTCCAGAG 244GTCTACCCTGAGAACCAGTT 170 TTGTGGATATCGTCTGGTCC 245 CAGCAACAACCTACCTTAGC171 CTGTGGAACTCGACTCTTGT 246 CGATTTGTTGGTACGTGTCC 172CTCCAGGATGCACAAATTCG 247 GCCATTTCCTTTGTACCTGC 173 GGCTCATGACAAAACACAGG248 GGCCAATAGAGAGACCACAA 174 CCGAATCCGAAAACAACACG 249CGGAGTCACATGGGTAGAAT 175 AGACCTAACACTGTGATGCG 250 CCCAGTACATTTGTCGGTTG176 CCTGGGTGAGCATAAACTTC 251 CCCACTAGCTGCTACTCAAA 177GAGTCTTGGACGAACAAAGG 252 GGTGTTGCGTCAAAGTAGAC 178 CACGTACCCATCATGTTCTG253 TACTCCAGCTCTTACTGTGC 179 CCGTGTTAGTCAAGTGTGTG 254GGATGAGCAGTCAACAGTTC 180 CTGGTGGCATAAATGGAACG 255 TCAGGATCGATCAGTTGTCC181 TGGATGTGGGTATCAATGGG 256 CCTCTCTTTTGTGCGGAAAC 182TGTGGCTAACGTAGGACAAG 257 TGCCTAGGATTTCGAGAACG 183 CCCTCGTTGTGAAAATGTGC258 GGCATTGTCCTTAACTTCGC 184 TCGTCATAGGTCAGCTTACG 259TGCATCTAACTACGATGGGC 185 CCTGATGACCTCTATGCCAA 260 CCCTAGTAGCCACACAACAT186 CGGCAAGAATGAATAGGGTG 261 GTGCCATGAATCATCGTCTC 187GTGCTATTGGTGGGAAATGG 262 CGCTCTGATGAAAGCTCCAT 188 GCCATGTTTGCTTACTGACG263 CCAGCCATAGTGCATATCCT 189 CGTTGTGGCATTCATTAGCG 264GAGATTGTCATGTGGTCGAC 190 GCGGTAGGATTGGATCTCAT 265 CCGCAGTCTAACAGGAAATC191 CCTCGCAAAGCTGTTATGAC 266 CGCTTCGACTGAACCTTATG 192GCCTTCATGTTATTGGACGC 267 CGATGCGACCAATAGAAGTG 193 AGCTGTAGTGTTCTTGAGGC268 GCCCTTGGTACGACATATTG 194 GGTAGTGTTCGTGTGACATG 269CAGTGATTTAGGTGACGCAG 195 CGCGGCATATGTTCATATCC 270 GGCATGGAAGAGGTAGTTTC196 GAGACTGGATCATGCAACAG 271 CCGATCGTATTCTGTGTCCA 197CACAACTTCTCTGGACTCCA 272 CTAAGTCAAGCACATGGGAC 198 CGACCATGATCTGTATGCGT273 GATCCACACTCAATCTCCTG 199 GGTGTGACTCTTGTTTCCGT 274CCTTGTCACATGCTGGTATC 200 ACGTACATACAAGTCTGGCG 275 CGCGATTGTGGTTAATAGGC201 CCTCAAGGATTCACTCGCAA 276 GTAGGCAAAGTTCACCACAC 202CTGTATAGGATGTCCACGCA 277 GCCACGAATCGAACAAGTAC 203 GCCTGTGATTGGTAAATGCG278 TTGAGATCTCGATGAGCACG 204 CGCACTCGTAGCATCTAGAA 279GGGCCAAGATCTATTCGTCA 205 CGATTTGTTGTCCCTAGCTG 280 GTGGCTATAGGTATGTCCGA206 CCCACTTCATCTGACTCTGA 281 CCACACTTTCTGCATTCGAC 207CGGCATTGTACAGGTGTTAC 282 CGGCATCTCAAAGCACATAC 208 TCTCCTATTTCCCTGAACGG283 CGTCCACAAATTTACTGCCC

Example 5. Characterization of Assembled TCR Genes Using Next-GenerationSequencing

A pool of nucleic acid sequences encoding paired TCRs were preparedusing the methods described herein (e.g., Example 1 with somemodifications). The reference sequences encoding natively paired TCRswere obtained from publicly available libraries. 553 reference sequenceswere selected to be demonstrated in this example. In this example, thenucleic acid sequences encoding CDR3-Jα (or CDR3-Jα fragments) andnucleic acid sequences encoding CDR3-Jβ (or CDR3-Jβ fragments) wereseparately synthesized. Alternatively, the paired CDR3-Jα and CDR3-Jβcan be synthesized together on one fragment.

553 CDR3-Jα fragments and 553 CDR3-Jβ fragments were synthesized andconnected (e.g., by ligation, overlapping PCR, etc.) together togenerate a pool of paired CDR3-Jα-CDR3-Jβ fragments. To ensure that aCDR3-Jα was ligated to the natively paired CDR3-Jβ, an arbitraryconnector sequence was synthesized on each CDR3-Jα and the arbitraryconnector sequence was designed such that it can minimizecross-hybridization with other arbitrary connector sequences in the poolof CDR3-Jα fragments. The complementary sequence of the arbitraryconnector sequence was synthesized on the natively paired CDR3-JP. Next,a pool of TRAV fragments (pre-synthesized according to the referencesequences) were connected to the paired CDR3-Jα-CDR3-Jβ fragments togenerate a pool of TRAV-CDR3-Jα-CDR3-Jβ fragments, each comprising aTRAV sequence connected to its cognate CDR3-Jα. Next, TRBC1 sequence wasappended downstream of TRAV-CDR3-Jα-CDR3-Jβ fragments to formTRAV-CDR3-Jα-CDR3-Jβ-TRBC1 fragments. These fragments were circularizedand re-linearized by cutting immediately upstream of the CDR3-Jβ,forming CDR3-Jβ-TRBC1-TRAV-CDR3-Jα fragments. The TRBC1 and TRAVfragments were designed in a way that an in-frame self-cleaving P2Asequence connects TRBC1 and TRAV. Next, a pool of TRBV fragments(pre-synthesized according to the reference sequences) were connected tothe CDR3-Jβ-TRBC1-TRAV-CDR3-Jα fragments to generateTRBV-CDR3-Jβ-TRBC1-TRAV-CDR3-Jα, which were subjected to next-generationsequencing (NGS) to assess abundance of clones and connection accuracyof the clones. Here, each clone in the NGS data refers to a uniquesequence. Since 553 reference sequences were used in this example, therewere a total of 553 clones in the NGS data. For data analysis describedherein, CDR3-Jα sequences were used to represent clones.

FIG. 12 shows accuracy and abundance of each clone after generating ofthe paired CDR3-Jα-CDR3-Jβ fragments. Each data point corresponds to aclone of a CDR3-Jα-CDR3-JP fragment. Accuracy refers to fraction ofCDR3-Jα fragments that are connected to the cognate CDR3-Jβ fragments.For each CDR-Ja, the accuracy can be calculated by the number ofcorrectly connected CDR3-Jβ fragments divided by the total number ofconnected CDR3-Jβ fragments. Abundance refers to the fraction of eachclone in the total pool of clones, which can be calculated by the totalnumber of reads of that clone divided by the total number of reads ofall clones. The data show that 497 out of 553 clones have an accuracyhigher than 95% and an abundance higher than 0.1/553, as indicated inthe box.

FIG. 13 shows accuracy and abundance of each clone after generating ofthe TRAV-CDR3-Jα-CDR3-Jβ fragments. Each data point corresponds to aclone of a TRAV-CDR3-Jα-CDR3-Jβ fragment. Accuracy refers to fraction ofCDR3-Jα-CDR3-Jβ fragments that are connected to the cognate TRAVfragments. For each CDR3-Jα-CDR3-Jβ, the accuracy can be calculated bythe number of correctly connected TRAV fragments divided by the totalnumber of connected TRAV fragments. Abundance refers to the fraction ofeach clone in the total pool of clones, which can be calculated by thetotal number of reads of that clone divided by the total number of readsof all clones. The data show that 523 out of 553 clones have an accuracyhigher than 95% and an abundance higher than 0.1/553, as indicated inthe box.

FIG. 14 shows a heatmap mapping each TRAV to each clone in the pool. Theclone number is ranked according to its cognate TRAV gene name. The datashow for each clone, majority of reads have the correct TRAV sequences,indicating high accuracy when connecting CDR3-Jα-CDR3-Jβ fragments totheir cognate TRAV fragments.

FIG. 15 shows abundance of each clone after generatingTRAV-CDR3-Jα-CDR3-Jβ fragments (e.g., TRAV addition in FIG. 15) versusabundance after generating CDR3-Jα-CDR3-Jβ fragments. The data showoverall bias is dominated by the bias during the ligation of CDR3-Jα andCDR3-Jβ fragments. This bias may be reduced or avoided by directlysynthesizing paired CDR3-Jα-CDR3-Jβ fragments.

FIG. 16 shows accuracy and abundance of each clone after generating ofthe TRBV-CDR3-Jβ-TRBC1-TRAV-CDR3-Jα fragments. Each data pointcorresponds to a clone of a TRBV-CDR3-Jβ-TRBC1-TRAV-CDR3-Jα fragment.Accuracy refers to fraction of CDR3-Jβ-TRBC1-TRAV-CDR3-Jα fragments thatare connected to the cognate TRBV fragments. For eachCDR3-Jβ-TRBC1-TRAV-CDR3-Jα, the accuracy can be calculated by the numberof correctly connected TRBV fragments divided by the total number ofconnected TRBV fragments. Abundance refers to the fraction of each clonein the total pool of clones, which can be calculated by the total numberof reads of that clone divided by the total number of reads of allclones. The data show that 514 out of 553 clones have an accuracy higherthan 95% and an abundance higher than 0.1/553, as indicated in the box.

FIG. 17 shows a heatmap mapping each TRBV to each clone in the pool. Theclone number is ranked according to its cognate TRBV gene name. The datashow for each clone, majority of reads have the correct TRBV sequences,indicating high accuracy when connecting CDR3-Jβ-TRBC1-TRAV-CDR3-Jαfragments to their cognate TRBV fragments.

FIG. 18 shows overall accuracy and abundance of each clone aftergenerating of the TRBV-CDR3-Jβ-TRBC1-TRAV-CDR3-Jα fragments. The overallaccuracy for each clone was calculated multiplying the accuracy in eachstep shown in FIGS. 12, 13 and 16. The abundance was calculated by thetotal number of reads of that clone divided by the total number of readsof all clones.

While various embodiments of the invention have been shown and describedherein, it will be obvious to those skilled in the art that suchembodiments are provided by way of example only. Numerous variations,changes, and substitutions may occur to those skilled in the art withoutdeparting from the invention. It should be understood that variousalternatives to the embodiments of the invention described herein may beemployed.

Embodiment Paragraphs

The present disclosure provides:

[1] A method for generating a nucleic acid molecule encoding a T-cellreceptor (TCR) chain or portion thereof, comprising: (a) providing atleast one nucleic acid molecule comprising a sequence encoding a CDR3 ofa TCR chain; (b) providing a plurality of nucleic acid molecules, eachnucleic acid molecule of the plurality comprising a sequence derivedfrom a TCR V gene, wherein the plurality of nucleic acid moleculescomprises at least two different sequences derived from at least twodifferent TCR V genes; and (c) contacting the at least one nucleic acidmolecule of (a) to the plurality of nucleic acid molecules of (b) in asame compartment, wherein the at least one nucleic acid molecule of (a)is capable of linking to a nucleic acid molecule of the plurality ofnucleic acid molecules to generate a third nucleic acid moleculecomprising the sequence encoding the CDR3 and a sequence derived fromone of the at least two different TCR V genes, thereby generating thenucleic acid molecule encoding the TCR chain or portion thereof.[2] The method of paragraph [1], wherein the least one nucleic acidmolecule comprises a first plurality of nucleic acid molecules, whereineach nucleic acid molecule of the first plurality of nucleic acidmolecules comprises a sequence encoding a CDR3 of a TCR chain.[3] The method of paragraph [1] or [2], wherein the at least one nucleicacid molecule of (a) is capable of specifically linking to a nucleicacid molecule of the plurality of nucleic acid molecules that comprisesa sequence derived from any single given TCR V gene of the at least twodifferent TCR V genes.[4] The method of paragraph [1], wherein the at least one nucleic acidmolecule further comprises a J region of the TCR chain.[5] The method of paragraph [2], wherein each nucleic acid molecule ofthe first plurality of nucleic acid molecules further comprises a Jregion of a TCR chain.[6] The method of any one of paragraphs [1]-[5], wherein the at leasttwo TCR V genes are human TCR V genes or mouse TCR V genes.[7] The method of any one of paragraphs [1]-[6], wherein the at leasttwo TCR V genes are selected from the group consisting of a humanTRAV1-1, TRAV1-2, TRAV2, TRAV3, TRAV4, TRAV5, TRAV6, TRAV7, TRAV8-1,TRAV8-2, TRAV8-3, TRAV8-4, TRAV8-6, TRAV9-1, TRAV9-2, TRAV10, TRAV12-1,TRAV12-2, TRAV12-3, TRAV13-1, TRAV13-2, TRAV14, TRAV16, TRAV17, TRAV18,TRAV19, TRAV20, TRAV21, TRAV22, TRAV23, TRAV24, TRAV25, TRAV26-1,TRAV26-2, TRAV27, TRAV29, TRAV30, TRAV34, TRAV35, TRAV36, TRAV38-1,TRAV38-2, TRAV39, TRAV40, and TRAV41.[8] The method of any one of paragraphs [1]-[6], wherein the at leasttwo TCR V genes are selected from the group consisting of a human TRBV2,TRBV3-1, TRBV4-1, TRBV4-2, TRBV4-3, TRBV5-1, TRBV5-4, TRBV5-5, TRBV5-6,TRBV5-8, TRBV6-1, TRBV6-2, TRBV6-3, TRBV6-4, TRBV6-5, TRBV6-6, TRBV6-8,TRBV6-9, TRBV7-2, TRBV7-3, TRBV7-4, TRBV7-6, TRBV7-7, TRBV7-8, TRBV7-9,TRBV9, TRBV10-1, TRBV10-2, TRBV10-3, TRBV11-1, TRBV11-2, TRBV11-3,TRBV12-3, TRBV12-4, TRBV12-5, TRBV13, TRBV14, TRBV15, TRBV16, TRBV18,TRBV19, TRBV20-1, TRBV24-1, TRBV25-1, TRBV27, TRBV28, TRBV29-1, andTRBV30.[9] The method of any one of paragraphs [1]-[8], wherein each sequenceof the plurality of sequences derived from the at least two differentTCR V genes comprises a sequence encoding L-PART1, L-PART2, FR1, CDR1,FR2, CDR2, and/or FR3.[10] The method of any one of paragraphs [1]-[9], wherein the TCR chainis a TCR alpha chain, a TCR beta chain, a TCR gamma chain, or a TCRdelta chain.[11] The method of any one of paragraphs [1]-[10], wherein the at leastone nucleic acid molecule further comprises an additional sequenceencoding an additional CDR3 of an additional TCR chain.[12] The method of paragraph [11], wherein the at least one nucleic acidmolecule comprises an additional J region of the additional TCR chain.[13] The method of paragraph [11] or [12], wherein the sequence encodingthe CDR3 and the additional sequence encoding the additional CDR3 areseparated by at most 100 nucleotides.[14] The method of any one of paragraphs [11]-[13], wherein the TCRchain and the additional TCR chain are a cognate pair of TCR chains.[15] The method of any one of paragraphs [1]-[14], wherein the at leastone nucleic acid molecule comprises a connector sequence, whichconnector sequence is capable of linking the at least one nucleic acidmolecule to the nucleic acid molecule of the plurality of nucleic acidmolecules to generate the third nucleic acid molecule.[16] The method of paragraph [15], wherein the at least one nucleic acidmolecule and the nucleic acid molecule of the plurality of nucleic acidmolecules encodes a functional TCR chain or portion thereof.[17] The method of paragraph [15] or [16], wherein the nucleic acidmolecule of the plurality of nucleic acid molecules comprises ananti-connector sequence, which anti-connector sequence is complementaryto the connector sequence of the at least one nucleic acid molecule of(a).[18] The method of any one of paragraphs [1]-[17], further comprisinglinking the at least one nucleic acid molecule of (a) and the nucleicacid molecule of the plurality of nucleic acid molecules of (b).[19] The method of paragraph [18], wherein linking comprises hybridizingthe at least one nucleic acid molecule of (a) and the nucleic acidmolecule of the plurality of nucleic acid molecules of (b).[20] The method of paragraph [19], wherein hybridizing compriseshybridizing the connector sequence of the at least one nucleic acidmolecule of (a) with the anti-connector sequence of the nucleic acidmolecule of the plurality of nucleic acid molecules of (b).[21] The method of any one of paragraphs [18]-[20], further comprising(i) extending a free 3′ end of the nucleic acid molecule of theplurality of nucleic acid molecules using the at least one nucleic acidmolecule of (a) as a template, and/or (ii) extending a free 3′ end ofthe at least one nucleic acid molecule of (a) using the nucleic acidmolecule of the plurality of nucleic acid molecules as a template, togenerate the third nucleic acid molecule.[22] The method of any one of paragraphs [1]-[21], further comprisingligating the at least one nucleic acid molecule of (a) and the nucleicacid molecule of the plurality of nucleic acid molecules (b).[23] The method of any one of paragraphs [1]-[22], further comprisingcontacting the third nucleic acid molecule with a restriction enzyme togenerate a sticky end.[24] The method of any one of paragraphs [1]-[23], further comprisingcontacting the third nucleic acid molecule with an additional nucleicacid molecule.[25] The method of paragraph [24], wherein the additional nucleic acidmolecule encodes a constant region or portion thereof of a TCR chain.[26] The method of paragraph [24] or [25], further comprising ligatingthe third nucleic acid molecule and the additional nucleic acidmolecule.[27] The method of any one of paragraphs [1]-[26], wherein a pluralityof nucleic acid molecules, each encoding a different TCR chain orportion thereof, are generated in the same compartment.[28] The method of paragraph [27], wherein at least five differentnucleic acid molecules of the plurality of nucleic acid molecules aregenerated in the same compartment.[29] The method of any one of paragraphs [1]-[26], wherein at least tendifferent nucleic acid molecules of the plurality of nucleic acidmolecules are generated in the same compartment.[30] The method of any one of paragraphs [1]-[29], wherein the samecompartment is a well, a tube, or a droplet.[31] The method of any one of paragraphs [1]-[30], wherein the at leastone nucleic acid molecule comprises a unique barcode.[32] The method of paragraph [31], wherein the unique barcode is aprimer binding site.[33] The method of any one of paragraphs [15]-[30], wherein theconnector sequence comprises a unique barcode.[34] The method of paragraph [33], wherein the unique barcode is aprimer binding site.[35] A composition comprising(a) a plurality of nucleic acid molecules, wherein each nucleic acidmolecule of the plurality of nucleic acid molecules comprises a sequencederived from a T-cell receptor (TCR) V gene and does not comprise a CDR3sequence, wherein a first nucleic acid molecule of the pluralitycomprises a first anti-connector sequence and a second nucleic acidmolecule of the plurality comprises a second anti-connector sequence,wherein the first anti-connector sequence is different from the secondanti-connector sequence, and wherein the sequence derived from a TCR Vgene of the first nucleic acid molecule and the second nucleic acidmolecule are derived from a different TCR V gene; and(b) at least one nucleic acid molecule comprising a sequence encoding aCDR3 of a TCR chain, wherein the at least one nucleic acid moleculefurther comprises a first connector sequence complementary to the firstanti-connector sequence.[36] The composition of paragraph [35], wherein the composition is aliquid composition.[37] The composition of paragraph [35] or [36], wherein the plurality ofnucleic acid molecules of (a) and the at least one nucleic acid moleculeof (b) are in a same compartment.[38] The composition of any one of paragraphs [35]-[37], wherein thesequence derived from the TCR V gene comprises at least ten nucleotidesof the TCR V gene.[39] The composition of any one of paragraphs [35]-[38], wherein the TCRV gene is a TRAV gene, a TRBV gene, a TRGV gene, or a TRDV gene.[40] The composition of any one of paragraphs [35]-[39], wherein thesequence derived from the TCR V gene comprises a sequence encodingL-PART1, L-PART2, FR1, CDR1, FR2, CDR2, and/or FR3.[41] The composition of any one of paragraphs [35]-[40], wherein the atleast one nucleic acid molecule further comprises a J region of the TCRchain.[42] The composition of any one of paragraphs [35]-[41], wherein the atleast one nucleic acid molecule further comprises an additional sequenceencoding an additional CDR3 of an additional TCR chain.[43] The composition of paragraph [42], wherein the at least one nucleicacid molecule further comprises an additional J region of the additionalTCR chain.[44] The composition of paragraph [42] or [43], wherein the sequenceencoding the CDR3 and the additional sequence encoding the CDR3 areseparated by at most 100 nucleotides.[45] The composition of any one of paragraphs [42]-[44], wherein the TCRchain and the additional TCR chain are a cognate pair of TCR chains.[46] The composition of any one of paragraphs [35]-[45], wherein the atleast one nucleic acid molecule of (b) comprises a first plurality ofnucleic acid molecules, and wherein each nucleic acid molecule of thefirst plurality of nucleic acid molecules comprises a sequence encodinga CDR3 of a TCR chain.[47] The composition of paragraph [46], wherein each nucleic acidmolecule of the first plurality of nucleic acid molecules encodes adifferent CDR3 of a different TCR chain.[48] The composition of paragraph [46] or [47], wherein each nucleicacid molecule of the first plurality of nucleic acid molecules comprisesa different connector sequence, which different connector sequence iscapable of specifically linking to a nucleic acid molecule of theplurality of nucleic acid molecules that comprises a sequence derivedfrom any single given TCR V gene.[49] The composition of any one of paragraphs [35]-[48], wherein thefirst anti-connector sequence or the second anti-connector sequencecomprises a TCR V gene sequence.[50] The composition of paragraph [49], wherein the TCR V gene sequencecomprises at least three nucleotides of the TCR V gene adjacent to asequence encoding a CDR3 in a rearranged gene.[51] The composition of any one of paragraphs [35]-[50], wherein thefirst anti-connector sequence or the second anti-connector sequencecomprises a pre-determined sequence.[52] The composition of any one of paragraphs [35]-[51], wherein thefirst connector sequence hybridizes to the first anti-connectorsequence.[53] The composition of any one of paragraphs [35]-[52], wherein the atleast one nucleic acid molecule of (b) comprises a unique barcode.[54] The composition of paragraph [53], wherein the unique barcode is aprimer binding site.[55] The composition of any one of paragraphs [35]-[52], wherein thefirst connector sequence of the at least one nucleic acid moleculecomprises a unique barcode.[56] The composition of paragraph [55], wherein the unique barcode is aprimer binding site.[57] A method for generating a plurality of nucleic acid molecules,comprising: (a) providing a first plurality of nucleic acid molecules,wherein a nucleic acid molecule of the first plurality of nucleic acidmolecules comprises a sequence encoding a first CDR3 of a first T-cellreceptor (TCR) chain and a second CDR3 of a second TCR chain, whereinthe first CDR3 and the second CDR3 are from a cognate pair of TCRchains; (b) providing a second plurality of nucleic acid molecules,wherein a nucleic acid molecule of the second plurality of nucleic acidmolecules comprises a sequence derived from a TCR V gene, wherein thenucleic acid molecule does not comprise a sequence encoding a constantdomain; and (c) contacting the first plurality of nucleic acid moleculesand the second plurality of nucleic acid molecules, wherein the nucleicacid molecule of the first plurality of nucleic acid molecules linkswith the nucleic acid molecule of the second plurality of nucleic acidmolecules to form a nucleic acid molecule comprising the sequenceencoding the first CDR3 and the second CDR3 and the sequence derivedfrom the TCR V gene, wherein the sequence encoding the first CDR3 andthe second CDR3 and the TCR V gene are derived from the cognate pair ofTCR chains.[58] The method of paragraph [57], wherein each nucleic acid molecule ofthe first plurality of nucleic acid molecules comprises a sequenceencoding a different first CDR3 of a first TCR chain and/or a differentCDR3 of a second TCR chain.[59] The method of paragraph [57] or [58], wherein each nucleic acidmolecule of the second plurality of nucleic acid molecules comprises asequence derived from a different TCR V gene.[60] The method of any one of paragraphs [57]-[59], wherein the firstplurality of nucleic acid molecules and the second plurality of nucleicacid molecules are contacted in a same compartment.[61] The method of any one of paragraphs [57]-[60], wherein the nucleicacid molecule of the first plurality of nucleic acid molecules furthercomprises a connector sequence, wherein the connector sequence links thenucleic acid molecule of the first plurality of nucleic acid moleculesand the nucleic acid molecule of the second plurality of nucleic acidmolecules.[62] The method of paragraph [61], wherein the nucleic acid molecule ofthe second plurality of nucleic acid molecules further comprises ananti-connector sequence, which anti-connector sequence is complementaryto the connector sequence.[63] The method of paragraph [62], wherein the connector sequencehybridizes to the anti-connector sequence to link the nucleic acidmolecule of the first plurality of nucleic acid molecules and thenucleic acid molecule of the second plurality of nucleic acid molecules.[64] The method of any one of paragraphs [58]-[63], wherein theconnector sequence is codon-diversified such that the connector sequenceof the nucleic acid molecule of the first plurality of nucleic acidmolecules is different from other connector sequences of other nucleicacid molecules of the first plurality of nucleic acid molecules.[65] The method of any one of paragraphs [57]-[64], wherein the nucleicacid molecule of the first plurality of nucleic acid molecules furthercomprises a first J region of the first TCR chain and/or a second Jregion of the second TCR chain.[66] The method of any one of paragraphs [57]-[65], wherein (i) thefirst TCR chain is a TCR alpha chain and the second TCR chain is a TCRbeta chain or (ii) the first TCR chain is a TCR gamma chain and thesecond TCR chain is a TCR delta chain.[67] The method of any one of paragraphs [57]-[66], wherein the TCR Vgene is a TRAV gene, a TRBV gene, a TRGV gene, or a TRDV gene.[68] The method of any one of paragraphs [57]-[67], wherein the nucleicacid molecule of the second plurality of nucleic acid molecules is adouble-stranded nucleic acid molecule.[69] The method of any one of paragraphs [57]-[68], wherein the nucleicacid molecule of the second plurality of nucleic acid molecules furthercomprises a sequence encoding a portion of a self-cleaving peptide.[70] The method of any one of paragraphs [62]-[69], wherein theanti-connector sequence is an overhang of the nucleic acid molecule ofthe second plurality of nucleic acid molecules.[71] The method of any one of paragraphs [62]-[70], wherein theconnector sequence or the anti-connector sequence is at least threenucleotides in length.[72] The method of any one of paragraphs [63]-[71], further comprising(i) extending a 3′ end of the nucleic acid molecule of the firstplurality of nucleic acid molecules hybridized thereto with the nucleicacid molecule of the second plurality of nucleic acid molecules and/or(ii) extending a 3′ end of the nucleic acid molecule of the secondplurality of nucleic acid molecules hybridized thereto with the nucleicacid molecule of the first plurality of nucleic acid molecules.[73] The method of any one of paragraphs [57]-[72], further comprisingligating the nucleic acid molecule of the first plurality of nucleicacid molecules with the nucleic acid molecule of the second plurality ofnucleic acid molecule.[74] The method of any one of paragraphs [57]-[73], further comprisingcontacting the nucleic acid molecule comprising the sequence encodingthe first CDR3 and the second CDR3 and the sequence derived from the TCRV gene with a restriction enzyme to generate a sticky end.[75] The method of any one of paragraphs [57]-[74], contacting thenucleic acid molecule comprising the sequence encoding the first CDR3and the second CDR3 and the sequence derived from the TCR V gene with anadditional nucleic acid molecule comprising a sequence encoding aconstant region or portion thereof.[76] The method of paragraph [74] or [75], further comprising ligatingthe nucleic acid molecule comprising the sequence encoding the firstCDR3 and the second CDR3 and the sequence derived from the TCR V genewith the additional nucleic acid molecule through the sticky end.[77] The method of any one of paragraphs [57]-[76], wherein the sequenceencoding the first CDR3 and the second encoding the second CDR3 areseparated by at most 100 nucleotides.[78] The method of any one of paragraphs [57]-[77], wherein the sequencederived from the TCR V gene comprises a sequence encoding FR1, CDR1,FR2, CDR2, and FR3.[79] The method of any one of paragraphs [57]-[77], wherein the sequencederived from the TCR V gene comprises a sequence encoding L-PART1,L-PART2, FR1, CDR1, FR2, CDR2, and FR3.[80] A composition comprising: (a) a first plurality of nucleic acidmolecules, wherein each nucleic acid molecule of the first plurality ofnucleic acid molecules comprises a sequence encoding a first CDR3 of afirst T-cell receptor (TCR) chain and a second CDR3 of a second TCRchain, wherein the first CDR3 and the second CDR3 are from a cognatepair of TCR chains; and (b) a second plurality of nucleic acidmolecules, wherein each nucleic acid molecule of the second plurality ofnucleic acid molecules comprises a sequence derived from a TCR V gene,and wherein each nucleic acid molecule of the second plurality ofnucleic acid molecules does not comprise a sequence encoding the firstCDR3 and the second CDR3;wherein (i) each nucleic acid molecule of the first plurality of nucleicacid molecules comprises a sequence encoding a different first CDR3and/or second CDR3, and/or (ii) each nucleic acid molecule of the secondplurality of nucleic acid molecules comprises a sequence derived from adifferent TCR V gene.[81] The composition of paragraph [80], wherein each nucleic acidmolecule of the first plurality of nucleic acid molecules furthercomprises a connector sequence, wherein a given connector sequence isusable to link a given nucleic acid molecule of the first plurality ofnucleic acid molecules and a given nucleic acid molecule of the secondplurality of nucleic acid molecules.[82] The composition of paragraph [80] or [81], wherein each nucleicacid molecule of the second plurality of nucleic acid molecules furthercomprises an anti-connector sequence, which anti-connector sequence iscomplementary to the connector sequence.[83] The composition of paragraph [81] or [82], wherein the connectorsequence is codon-diversified such that the given connector sequence ofthe given nucleic acid molecule of the first plurality of nucleic acidmolecules is different from other connector sequences of other nucleicacid molecules of the first plurality of nucleic acid molecules.[84] The composition of any one of paragraphs [81]-[83], wherein theconnector sequence encodes an amino acid sequence.[85] The composition of paragraph [84], wherein the connector sequenceis in frame with the sequence encoding the first CDR3 of the first TCRchain and the second CDR3 of the second TCR chain.[86] The composition of any one of paragraphs [81]-[85], wherein theconnector sequence comprises at least three nucleotides.[87] The composition of paragraph [86], wherein the connector sequencecomprises at least three nucleotides of the TCR V gene adjacent to asequence encoding the first CDR3 of the first TCR chain or the secondCDR3 of the second TCR chain in a rearranged gene.[88] The composition of any one of paragraphs [84]-[87], wherein a givenamino acid sequence encoded by the given connector sequence is the sameor substantially the same as at least one other amino acid sequenceencoded by at least one other connector sequence.[89] The composition of any one of paragraphs [84]-[87], wherein a givenamino acid sequence encoded by the given connector sequence is differentfrom other amino acid sequences encoded by other connector sequences.[90] The composition of any one of paragraphs [80]-[89], wherein eachnucleic acid molecule of the first plurality of nucleic acid moleculesfurther comprises a first J region of the first TCR chain and/or asecond J region of the second TCR chain.[91] The composition of any one of paragraphs [80]-[90], wherein thecomposition is a liquid composition.[92] The composition of any one of paragraphs [80]-[91], wherein thefirst plurality of nucleic acid molecules and the second plurality ofnucleic acid molecules are within a same compartment.[93] The composition of any one of paragraphs [81]-[92], wherein thegiven nucleic acid molecule of the first plurality of nucleic acidmolecules is linked to the given nucleic acid molecule of the secondplurality of nucleic acid molecules through the given connectorsequence.[94] The composition of paragraph [93], wherein the given nucleic acidmolecule of the first plurality of nucleic acid molecules hybridizes tothe given nucleic acid molecule of the second plurality of nucleic acidmolecules through the given connector sequence hybridized to a givenanti-connector sequence.[95] The composition of any one of paragraphs [80]-[94], wherein thesequence encoding the first CDR3 and the sequence encoding the secondCDR3 are separated by at most 100 nucleotides.[96] The composition of any one of paragraphs [80]-[95], wherein thesequence derived from the TCR V gene comprises a sequence encoding FR1,CDR1, FR2, CDR2, and FR3.[97] The composition of any one of paragraphs [80]-[95], wherein thesequence derived from the TCR V gene comprises a sequence encodingL-PART1, L-PART2, FR1, CDR1, FR2, CDR2, and FR3.[98] The composition of any one of paragraphs [80]-[97], wherein eachnucleic acid molecule of the first plurality of nucleic acid moleculesor the second plurality of molecules is chemically synthesized.[99] The composition of any one of paragraphs [80]-[98], wherein eachnucleic acid molecule of the first plurality of nucleic acid moleculesis at most about 250 nucleotides long.[100] A composition comprising a plurality of nucleic acid molecules,each nucleic acid molecule of the plurality of nucleic acid moleculescomprising a sequence derived from a T-cell receptor (TCR) V gene,wherein the plurality of nucleic acid molecules comprises a firstnucleic acid molecule having a first connector sequence and a secondnucleic acid molecule having a second connector sequence, wherein thefirst connector sequence is different from the second connectorsequence.[101] The composition of paragraph [100], each nucleic acid molecule ofthe plurality of nucleic acid molecules comprises a sequence derivedfrom a different TCR V gene.[102] The composition of paragraph [100] or [101], each nucleic acidmolecule of the plurality of nucleic acid molecules comprises adifferent connector sequence.[103] The composition of any one of paragraphs [100]-[102], wherein eachnucleic acid molecule of the plurality of nucleic acid molecules doesnot comprise a sequence encoding a CDR3 of a TCR chain.[104] The composition of any one of paragraphs [100]-[103], wherein eachnucleic acid molecule of the plurality of nucleic acid molecules doesnot comprise a sequence encoding a constant domain of a TCR chain.[105] The composition of any one of paragraphs [100]-[104], wherein thesequence derived from the TCR V gene comprises at least ten nucleotidesof the TCR V gene.[106] The composition of any one of paragraphs [100]-[105], wherein theTCR V gene is a TRAV gene, a TRBV gene, a TRGV gene, or a TRDV gene.[107] A composition comprising a plurality of nucleic acid molecules,each nucleic acid molecule of the plurality of nucleic acid moleculesencoding a CDR3 of a T-cell receptor (TCR) chain, wherein a firstnucleic acid molecule of the plurality comprises a first connectorsequence and a second nucleic acid molecule of the plurality comprises asecond connector sequence, wherein the first connector sequence isdifferent from the second connector sequence.[108] The composition of paragraph [107], wherein each nucleic acidmolecule of the plurality of nucleic acid molecules further comprises aJ region of a TCR chain.[109] The composition of paragraph [107], wherein each nucleic acidmolecule of the plurality of nucleic acid molecules encodes a first CDR3of a first TCR chain and a second CDR3 of a second TCR chain.[110] The composition of paragraph [109], wherein each nucleic acidmolecule of the plurality of nucleic acid molecules further comprises afirst J region of a first TCR chain and a second J region of a secondTCR chain.[111] The composition of any one of paragraphs [107]-[110], wherein eachnucleic acid molecule of the plurality of nucleic acid molecules encodesa different CDR3 of a different TCR chain.[112] The composition of any one of paragraphs [107]-[111], wherein eachnucleic acid molecule of the plurality of nucleic acid moleculescomprises a different connector sequence.[113] The composition of any one of paragraphs [107]-[112], wherein eachnucleic acid molecule of the plurality of nucleic acid molecules doesnot comprise greater than 200 nucleotides of a TCR V gene.[114] The composition of any one of paragraphs [107]-[113], wherein eachnucleic acid molecule of the plurality of nucleic acid molecules doesnot comprise a sequence encoding a constant domain of a TCR chain.[115] The composition of any one of paragraphs [100]-[114], wherein thefirst connector sequence or the second connector sequence comprises asequence derived from a TCR V gene.[116] The composition of paragraph [115], wherein the sequence derivedfrom the TCR V gene comprises at least three nucleotides of the TCR Vgene adjacent to a sequence encoding a CDR3 in a rearranged gene.[117] The composition of any one of paragraphs [100]-[116], wherein thefirst connector sequence or the second connector sequence comprises apre-determined sequence.[118] The composition of any one of paragraphs [107]-[114], wherein thefirst connector sequence or the second connector sequence comprises asequence complementary to a TCR V gene sequence.[119] The composition of any one of paragraphs [107]-[114] and [118],wherein the composition further comprises a second plurality of nucleicacid molecules, each nucleic acid molecule of the second plurality ofnucleic acid molecules comprising a sequence derived from a TCR V gene.[120] The composition of paragraph [119], wherein a first nucleic acidmolecule of the second plurality comprises a first anti-connectorsequence, which first anti-connector sequence is complementary to thefirst connector sequence.[121] The composition of paragraph [119] or [120], wherein a secondnucleic acid molecule of the second plurality comprises a secondanti-connector sequence, which second anti-connector sequence iscomplementary to the second connector sequence.[122] The composition of paragraph [120] or [121], wherein the firstanti-connector sequence of the first nucleic acid molecule of the secondplurality is linked to the first connector sequence of the first nucleicacid molecule of the first plurality.[123] The composition of paragraph [121] or [122], wherein the secondanti-connector sequence of the second nucleic acid molecule of thesecond plurality is linked to the second connector sequence of thesecond nucleic acid molecule of the first plurality.[124] A composition comprising a plurality of nucleic acid molecules,each comprising a sequence encoding at least ten amino acids of a T-cellreceptor (TCR) chain, wherein a first nucleic acid molecule of theplurality comprises a first connector sequence and a second nucleic acidmolecule of the plurality comprises a second connector sequence, whereinthe first connector sequence is different from the second connectorsequence, wherein the first connector sequence or the second connectorsequence encodes a portion of a TCR chain and wherein the firstconnector sequence or the second connector sequence is in frame with thesequence encoding at least ten amino acids of a TCR chain.[125] The composition of paragraph [124], wherein the first connectorsequence or the second connector sequence comprises at least fourcontiguous nucleotides of a TCR chain gene and is in frame with thesequence encoding at least ten amino acids of a TCR chain.[126] The composition of paragraph [124] or [125], wherein the firstconnector sequence and the second connector sequence encodes at leasttwo contiguous amino acids of a TCR chain.[127] The composition of any one of paragraphs [124]-[126], wherein theTCR chain of the portion of the TCR chain and the TCR chain encoded bythe sequence encoding at least ten amino acids is the same.[128] The composition of paragraph [124], wherein each nucleic acidmolecule of the plurality of nucleic acid molecules comprises a sequencederived from a TCR V gene.[129] The composition of any one of paragraphs [124]-[128], wherein eachnucleic acid molecule of the plurality of nucleic acid molecules encodesa CDR3 of the TCR chain.[130] The composition of paragraph [129], wherein each nucleic acidmolecule of the plurality of nucleic acid molecules further comprises aJ region of the TCR chain.[131] The composition of paragraph [129], wherein each nucleic acidmolecule of the plurality of nucleic acid molecules encodes a first CDR3of a first TCR chain and a second CDR3 of a second TCR chain.[132] The composition of paragraph [131], wherein each nucleic acidmolecule of the plurality of nucleic acid molecules further comprises afirst J region of a first TCR chain and a second J region of a secondTCR chain.[133] The composition of paragraph [131] or [132], wherein a sequenceencoding the first CDR3 and a sequence encoding the second CDR3 areseparated by at most 100 nucleotides.[134] The composition of any one of paragraphs [124]-[133], wherein thefirst connector sequence or the second connector sequence comprises asequence derived from a TCR V gene.[135] The composition of any one of paragraphs [124]-[134], wherein thefirst connector sequence or the second connector sequence comprises apre-determined sequence.[136] The composition of any one of paragraphs [100]-[135], wherein thefirst connector sequence comprises at least one nucleotide that isdifferent from a nucleotide of the second connector sequence.[137] The composition of any one of paragraphs [100]-[136], wherein thefirst connector sequence encodes a same amino acid sequence as thesecond connector sequence.[138] The composition of any one of paragraphs [100]-[136], wherein thefirst connector sequence encodes a different amino acid sequence fromthe second connector sequence.[139] A method for generating a plurality of nucleic acid molecules,each nucleic acid molecule of the plurality encoding a T-cell receptor(TCR) chain or region thereof, comprising: contacting a first pluralityof nucleic acid molecules and a second plurality of nucleic acidmolecules to generate a third plurality of nucleic acid moleculescomprising at least two different nucleic acid molecules, wherein eachof the at least two different nucleic acid molecules has a differentsequence encoding a different TCR chain or region thereof, and whereinthe at least two different nucleic acid molecules are generated in asame compartment.[140] The method of paragraph [139], wherein each nucleic acid moleculeof the first plurality of nucleic acid molecules comprises a sequenceencoding a CDR3 of the TCR chain.[141] The method of paragraph [140], wherein each nucleic acid moleculeof the first plurality of nucleic acid molecules comprises a J region ofthe TCR chain.[142] The method of any one of paragraphs [139]-[141], wherein eachnucleic acid molecule of the second plurality of nucleic acid moleculescomprises a sequence derived from a TCR V gene of the TCR chain.[143] The method of paragraph [142], wherein the TCR V gene is a humanTCR V gene.[144] The method of paragraph [142] or [143], wherein the TCR V gene isa human TRAV1-1, TRAV1-2, TRAV2, TRAV3, TRAV4, TRAV5, TRAV6, TRAV7,TRAV8-1, TRAV8-2, TRAV8-3, TRAV8-4, TRAV8-6, TRAV9-1, TRAV9-2, TRAV10,TRAV12-1, TRAV12-2, TRAV12-3, TRAV13-1, TRAV13-2, TRAV14, TRAV16,TRAV17, TRAV18, TRAV19, TRAV20, TRAV21, TRAV22, TRAV23, TRAV24, TRAV25,TRAV26-1, TRAV26-2, TRAV27, TRAV29, TRAV30, TRAV34, TRAV35, TRAV36,TRAV38-1, TRAV38-2, TRAV39, TRAV40, or TRAV41.[145] The method of paragraph [142] or [143], wherein the TCR V gene isa human TRBV2, TRBV3-1, TRBV4-1, TRBV4-2, TRBV4-3, TRBV5-1, TRBV5-4,TRBV5-5, TRBV5-6, TRBV5-8, TRBV6-1, TRBV6-2, TRBV6-3, TRBV6-4, TRBV6-5,TRBV6-6, TRBV6-8, TRBV6-9, TRBV7-2, TRBV7-3, TRBV7-4, TRBV7-6, TRBV7-7,TRBV7-8, TRBV7-9, TRBV9, TRBV10-1, TRBV10-2, TRBV10-3, TRBV11-1,TRBV11-2, TRBV11-3, TRBV12-3, TRBV12-4, TRBV12-5, TRBV13, TRBV14,TRBV15, TRBV16, TRBV18, TRBV19, TRBV20-1, TRBV24-1, TRBV25-1, TRBV27,TRBV28, TRBV29-1, or TRBV30.[146] The method of any one of paragraphs [139]-[145], wherein thesequence derived from the TCR V gene comprises a sequence encoding FR1,CDR1, FR2, CDR2, and FR3.[147] The method of any one of paragraphs [139]-[145], wherein thesequence derived from the TCR V gene comprises a sequence encodingL-PART1, L-PART2, FR1, CDR1, FR2, CDR2, and FR3.[148] The method of any one of paragraphs [139]-[147], wherein the TCRchain is a TCR alpha chain, a TCR beta chain, a TCR gamma chain, or aTCR delta chain.[149] The method of any one of paragraphs [140]-[148], wherein eachnucleic acid molecule of the first plurality of nucleic acid moleculesfurther comprises an additional sequence encoding an additional CDR3 ofan additional TCR chain.[150] The method of paragraph [149], wherein each nucleic acid moleculeof the first plurality of nucleic acid molecules comprises an additionalJ region of the additional TCR chain.[151] The method of paragraph [149] or [150], wherein the TCR chain andthe additional TCR chain are a cognate pair of TCR chains.[152] The method of any one of paragraphs [139]-[151], wherein a nucleicacid molecule of the plurality of nucleic acid molecules encodes adifferent TCR or region thereof.[153] The method of any one of paragraphs [139]-[152], wherein a givennucleic acid molecule of the first plurality of nucleic acid moleculescomprises a connector sequence, which connector sequence is usable forlinking the given nucleic acid molecule of the first plurality ofnucleic acid molecules to a given nucleic acid molecule of the secondplurality of nucleic acid molecules.[154] The method of paragraph [153], wherein the given nucleic acidmolecule of the first plurality of nucleic acid molecules and the givennucleic acid molecule of the second plurality of nucleic acid moleculesencodes a functional TCR chain or region thereof.[155] The method of paragraph [153] or [154], wherein the given nucleicacid molecule of the second plurality of nucleic acid moleculescomprises an anti-connector sequence, which anti-connector sequence iscomplementary to the connector sequence of the given nucleic acidmolecule of the first plurality of nucleic acid molecules.[156] The method of any one of paragraphs [153]-[155], furthercomprising linking the given nucleic acid molecule of the firstplurality of nucleic acid molecules and the given nucleic acid moleculeof the second plurality of nucleic acid molecules.[157] The method of paragraph [156], wherein linking compriseshybridizing the given nucleic acid molecule of the first plurality ofnucleic acid molecules and the given nucleic acid molecule of the secondplurality of nucleic acid molecules.[158] The method of paragraph [157], wherein hybridizing compriseshybridizing the connector sequence of the given nucleic acid molecule ofthe first plurality of nucleic acid molecules with the anti-connectorsequence of the given nucleic acid molecule of the second plurality ofnucleic acid molecules.[159] The method of any one of paragraphs [156]-[158], furthercomprising (i) extending a free 3′ end of the given nucleic acidmolecule of the second plurality of nucleic acid molecules using thegiven nucleic acid molecule of the first plurality of nucleic acidmolecules as a template, and/or (ii) extending a free 3′ end of thenucleic acid molecule of the first plurality of nucleic acid moleculesusing the given nucleic acid molecule of the second plurality of nucleicacid molecules as a template, to generate a nucleic acid molecule of thethird plurality of nucleic acid molecules.[160] The method of any one of paragraphs [139]-[159], furthercomprising ligating the given nucleic acid molecule of the firstplurality of nucleic acid molecules and the given nucleic acid moleculeof the second plurality of nucleic acid molecules.[161] The method of any one of paragraphs [139]-[160], furthercomprising contacting the nucleic acid molecule of the third pluralityof nucleic acid molecules with a restriction enzyme to generate a stickyend.[162] The method of any one of paragraphs [139]-[161], furthercomprising contacting the nucleic acid molecule of the third pluralityof nucleic acid molecules with an additional nucleic acid molecule.[163] The method of paragraph [162], wherein the additional nucleic acidmolecule encodes a constant region or a portion thereof of a TCR chain.[164] The method of paragraph [162] or [163], further comprisingligating the nucleic acid molecule of the third plurality of nucleicacid molecules and the additional nucleic acid molecule.[165] The method of any one of paragraphs [139]-[164], wherein at leastfive different nucleic acid molecules of the third plurality of nucleicacid molecules are generated in the same compartment.[166] The method of any one of paragraphs [139]-[165], wherein at leastten different nucleic acid molecules of the third plurality of nucleicacid molecules are generated in the same compartment.[167] The method of any one of paragraphs [139]-[166], wherein the samecompartment is a well, a tube, or a droplet.[168] A method for generating a plurality of nucleic acid molecules,comprising: (a) providing a first plurality of nucleic acid molecules,wherein a nucleic acid molecule of the first plurality of nucleic acidmolecules comprises a sequence encoding a first CDR3 of a first T-cellreceptor (TCR) chain and a second CDR3 of a second TCR chain, whereinthe first CDR3 and the second CDR3 are from a cognate pair of TCRchains; (b) providing a second plurality of nucleic acid molecules,wherein a nucleic acid molecule of the second plurality of nucleic acidmolecules comprises a sequence derived from a TCR V gene; and (c)contacting the first plurality of nucleic acid molecules and the secondplurality of nucleic acid molecules, wherein the nucleic acid moleculeof the first plurality of nucleic acid molecules links with the nucleicacid molecule of the second plurality of nucleic acid molecules to forma linear nucleic acid molecule comprising the sequence encoding thefirst CDR3 and the second CDR3 and the sequence derived from the TCR Vgene, wherein the sequence encoding the first CDR3 and the second CDR3and the TCR V gene are derived from the cognate pair of TCR chains.[169] A method for generating a plurality of nucleic acid molecules,comprising: (a) providing a first plurality of nucleic acid molecules,wherein a nucleic acid molecule of the first plurality of nucleic acidmolecules comprises (i) a synthetic sequence encoding a first CDR3 of afirst T-cell receptor (TCR) chain and a second CDR3 of a second TCRchain and (ii) a synthetic sequence encoding a third CDR3 of a thirdT-cell receptor (TCR) chain and a fourth CDR3 of a fourth TCR chain,wherein the first CDR3 and the second CDR3 are from a first cognate pairof TCR chains and wherein the third CDR3 and the fourth CDR3 are from asecond cognate pair of TCR chains; (b) providing a second plurality ofnucleic acid molecules, wherein a nucleic acid molecule of the secondplurality of nucleic acid molecules comprises a sequence derived from aTCR V gene; and (c) contacting the first plurality of nucleic acidmolecules and the second plurality of nucleic acid molecules, whereinthe nucleic acid molecule of the first plurality of nucleic acidmolecules links with the nucleic acid molecule of the second pluralityof nucleic acid molecules to form a nucleic acid molecule comprising thesequence encoding the first CDR3 and the second CDR3 and the sequencederived from the TCR V gene, wherein the sequence encoding the firstCDR3 and the second CDR3 and the TCR V gene are derived from the cognatepair of TCR chains.[170] A method of identifying a sequence of a natively paired T-cellreceptor (TCR) in a tissue sample from a subject, comprising: (a)identifying one or more paired sequences of one or more natively pairedTCRs in a sample containing a plurality of peripheral T cells obtainedfrom the subject, wherein each of the one or more paired sequencescomprises a CDR3 sequence; and (b) identifying a tissue CDR3 sequence ofa TCR chain of a TCR in the tissue sample for which the other TCR chainto which it is natively paired is unknown, wherein the tissue CDR3sequence matches a CDR3 sequence of at least one paired sequence of theone or more paired sequences of the one or more natively paired TCRs,thereby identifying the at least one paired sequence as the sequence ofthe natively paired TCR in the tissue sample.[171] The method of paragraph [170], wherein identifying in (a)comprises sequencing the one or more natively paired TCRs in the samplecontaining the plurality of peripheral T cells.[172] The method of paragraph [171], wherein the sequencing comprisessingle cell sequencing.[173] The method of paragraph [172], wherein the single cell sequencingcomprises partitioning the plurality of peripheral T cells into aplurality of compartments, each compartment comprising an individualperipheral T cell of the plurality of peripheral T cells.[174] The method of any one of paragraphs [170]-[173], wherein thetissue sample is not a bodily fluid sample.[175] The method of any one of paragraphs [170]-[174], wherein thetissue sample is a solid tumor sample.[176] The method of any one of paragraphs [170]-[175], wherein thetissue sample is a fixed or frozen sample.[177] The method of any one of paragraphs [170]-[176], wherein thesample containing the plurality of peripheral T cells is a peripheralblood mononuclear cell (PBMC) sample.[178] The method of any one of paragraphs [170]-[177], furthercomprising, prior to (a), obtaining a blood sample from the subject.[179] The method of paragraph [178], further comprising, prior to (a),isolating peripheral blood mononuclear cells from the blood sample.[180] The method of any one of paragraphs [170]-[179], wherein thetissue sample comprises a tumor-infiltrating T cell.[181] A method of identifying a target-reactive T-cell receptor (TCR),comprising: (a) providing a cell comprising the TCR identified from anyone of paragraphs [170]-[180]; and (b) contacting the cell with a targetantigen presented by an antigen-presenting cell (APC), wherein the cellbinds to the target antigen presented by the APC via the TCR, therebyidentifying the TCR as the target-reactive TCR.[182] The method of paragraph [181], wherein the target antigen is atumor antigen.[183] The method of paragraph [181] or [182], further comprisingdelivering a sequence encoding the target-reactive TCR into a host cell.[184] The method of paragraph [183], further comprising administeringthe host cell into the subject.[185] The method of paragraph [183] or [184], wherein the host cell is aT cell.[186] The method of paragraph [185], wherein the T cell is an autologousT cell.[187] The method of paragraph [185], wherein the T cell is an allogeneicT cell.[188] The method of any one of paragraphs [181]-[187], wherein the cellis a reporter cell line, which reporter cell line comprises a reportergene that is expressed upon the cell binding to the target antigenpresented by the APC.

What is claimed is: 1.-58. (canceled)
 59. A composition comprising (a) aplurality of nucleic acid molecules, wherein each nucleic acid moleculeof the plurality of nucleic acid molecules comprises a sequence derivedfrom a T-cell receptor (TCR) V gene and does not comprise acomplementarity determining region 3 (CDR3) sequence, wherein a firstnucleic acid molecule of the plurality comprises a first anti-connectorsequence and a second nucleic acid molecule of the plurality comprises asecond anti-connector sequence, wherein the first anti-connectorsequence is different from the second anti-connector sequence, andwherein the sequence derived from a TCR V gene of the first nucleic acidmolecule and the second nucleic acid molecule are derived from adifferent TCR V gene; and (b) at least one nucleic acid moleculecomprising a sequence encoding a CDR3 of a TCR chain, wherein the atleast one nucleic acid molecule further comprises a first connectorsequence complementary to the first anti-connector sequence such thatthe at least one nucleic acid molecule is capable of specificallylinking to the first nucleic acid molecule of (a) comprising the firstanti-connector sequence.
 60. The composition of claim 59, wherein thecomposition is a liquid composition.
 61. The composition of claim 59,wherein the plurality of nucleic acid molecules of (a) and the at leastone nucleic acid molecule of (b) are in a same compartment.
 62. Thecomposition of claim 59, wherein the sequence derived from the TCR Vgene comprises at least ten nucleotides of the TCR V gene.
 63. Thecomposition of claim 59, wherein the at least one nucleic acid moleculeof (b) and the first nucleic acid molecule of (a) are from the same TCRchain expressed in a T cell of a sample from a subject.
 64. Thecomposition of claim 59, wherein the sequence derived from the TCR Vgene comprises a sequence encoding a first part of a leader peptide(L-PART1), a second part of the leader peptide (L-PART2), framework 1(FR1), complementarity determining region 1 (CDR1), framework 2 (FR2),complementarity determining region 2 (CDR2), or framework 3 (FR3). 65.The composition of claim 59, wherein the at least one nucleic acidmolecule of (b) further comprises a J region of the TCR chain.
 66. Thecomposition of claim 59, wherein the at least one nucleic acid moleculeof (b) further comprises an additional sequence encoding an additionalCDR3 of an additional TCR chain.
 67. The composition of claim 66,wherein the at least one nucleic acid molecule further comprises anadditional J region of the additional TCR chain.
 68. The composition ofclaim 66, wherein the sequence encoding the CDR3 and the additionalsequence encoding the CDR3 are separated by at most 100 nucleotides. 69.The composition of claim 66, wherein the TCR chain and the additionalTCR chain are a cognate pair of TCR chains expressed in a T cell of asample from a subject.
 70. The composition of claim 59, wherein the atleast one nucleic acid molecule of (b) comprises two or more nucleicacid molecules, and wherein each nucleic acid molecule of the two ormore nucleic acid molecules comprises a sequence encoding a CDR3 of aTCR chain.
 71. The composition of claim 70, wherein each nucleic acidmolecule of the two or more nucleic acid molecules encodes a differentCDR3 of a different TCR chain.
 72. The composition of claim 70, whereineach nucleic acid molecule of the two or more nucleic acid moleculescomprises a different connector sequence, and wherein the differentconnector sequence is capable of specifically linking to a nucleic acidmolecule of the plurality of nucleic acid molecules of (a) thatcomprises a sequence derived from any single given TCR V gene.
 73. Thecomposition of claim 59, wherein the first anti-connector sequence orthe second anti-connector sequence comprises a TCR V gene sequence. 74.The composition of claim 73, wherein the TCR V gene sequence comprisesat least three nucleotides of the TCR V gene adjacent to a sequenceencoding a CDR3 in a rearranged gene.
 75. The composition of claim 59,wherein the first anti-connector sequence or the second anti-connectorsequence comprises a pre-determined sequence that does not comprise aportion of the TCR V gene.
 76. The composition of claim 59, wherein thefirst connector sequence hybridizes to the first anti-connectorsequence.
 77. The composition of claim 59, wherein the at least onenucleic acid molecule of (b) comprises a unique barcode.
 78. Thecomposition of claim 77, wherein the unique barcode is a primer bindingsite.